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The spectre of smallpox lingers 


The last known person to die from the virus was infected 40 years ago. Yet the disease remains 


aworry, and precautions should continue. 


A medical photographer called Janet Parker was admitted to hos 
pital on 11 August 1978, her body riddled with lesion and scars. 

She passed away a month late, the last known person to die from 
one ofthe world’ deadliest diseases: smallpox. In 1980, the World 
Health Organization declared that the llnesshad been eradicated. So 
it might raise eyebrows thatthe US Food and Drug Administration 
(FDA) last month approved the fist ever drug to teat. 

The antiviral, called TPOXX (tecovirimat) and made by SIGA 
Technologies of New York City, inhibits proteins in human cells that 
allow poxviruses to replicate and spread. TPOXX hasnt actually been 
tested against smallpox in humans. Rather, the company trialled it 
in monkeys and rabbits infected with monkeypox and rabbitpox, 
respectively, and found that it allowed more than $09 ofthe animals 
tosurvive D.W.Grosenbach et alN. Engl. Med 378, 44-53; 2018) 
“This was enough forthe FDA to approve it for human use, given that, 
clinical tials of such a deadly killer are effectively impossible 

‘Another antiviral drug, caledbrincidofovir,intfarbehind, In June, 
the FDA granted the drug’ developer, Chimeric of Durham, North 
Carolina, fast-track review by designating it an orphan drug —a 
therapy for a disease with (all involved surely hope) a small market 

‘The approvals put the United States on track to fulfil criteria for 
bioweapons defence drafted in response tothe 2001 anthrax attacks 
that killed five people. In 2008, the US Institute of Medicine recom. 
‘mended that the county stockpile smallpox vaccines and develop 
at least two antiviral drugs that attack the virus through different 
‘molecular mechanisms, to prevent the vius from becoming resistant. 

“Although itmight seem ikea waste of resources to develop cures for 
an extinct disease, smallpox resurgence does remain a realistic threat. 

“The variola virus that causes the disease officially existsin only two, 
highly secured places: the US Centers for Disease Control and Preven 
tion (CDC) in Atlanta, Georgia, and its Russian counterpart, VECTOR 
near Novosibirsk Every few yeas, the World Health Assembly debates 
‘whether to destroy these last two stocks, and each time, experts post 
pone the decision. Researchers stil use them and, they say, that work 
should continue, given that we can never be certain the disease is dead. 

While the reserves remain, some warn thata disgruntled employee 
could lease the virus — which is what investigators think happened 
With the anthrax attacks. Orit could be spread through alabaccident 
‘That’ what happened to Janet Parke, who worked directly abovea 
university lab that studied the smallpox virus, Investigatorslater deter 
‘mined that although the researchers had protected themselves, the 
disease had drifted through air ducts 

‘The CDC and VECTOR are also unlikely tobe the only possible 
sources. In 2014, the US National Institutes of Health (NIH) 
discovered live stocks na storage room on its campus in Bethesda, 
Maryland. Ifthe venerable and highly regulated NIH could lose tack 
ofsmallpox, other institutions could have some forgotten valsas well 
Even more worryingiis the prospect thatthe virus lives an in freezers 


F& years ago, the UK city of Birmingham was in a panic. 


informer Soviet states, as US intelligence experts fear,andin countries 
with illicit bioweapons programmes, 

Smallpox might also lie dormant in mummies and corpses of 
people who died from the disease. As the Arctic permafrost melts, 
accidental encounters with previously frozen diseases could become 

‘The greatest threat is advances in synthetic biology, which could 

permit a rogue lab to re-engineer a small 
pox virus. In 2016, researchers in Canada 
announced that they had created horsepox 
using pieces of DNA ordered from com 
panies. A synthetic smallpox virus could 
be even more dangerous than the original, 
Ihecause it could be designed to spread more 
easily or with ways to survive new therapies. 
‘The eradication of smallpox remains one of humanity's most 
impressive achievements, one that has saved innumerable lives. But it 
has also left people unvaccinated and vulnerable. New therapies and 
continued funding for national vaccine stockpiles serve as a crucial 
insurance policy — justin case.m 


Blurred lines 


Anin-depth analysis reveals the scale of 
differences within common cell lines. 


medical research — and, like all animals, these ubiquitous bio- 
logical tools can have their differences. Experiments with cells, 
that workin one lab, orevenin one batch, can fail or give different results 
in another. This can be down to mislabelling, and even contamination 
(see Nature 520, 264; 2015). In Nature this week, scientists highlight 
another reason: genetic variability in the same cell line (see page 325), 
‘The researchers analysed cel lines from different sources, includ 
ing the popular MCE7 type used to investigate cancer. They founda 
staggering amount of variation. Supposedly identical cell ines were 
actualy far from identical — there were differences in their genomes, 
‘gene expression, morphology and, importantly, drug sensitivity. This 
could help to explain why so many experimental results that rely on 
the standard performance of a specific cll line cant be reproduced. 
‘The cause is genetic drift — the cell lines diverge as they are 
independently and individually passaged, or subcultured. Itisn'tan 
unexpected finding, but has never been shown before on such a scale, 
will stil probably come asa shock to many researchers-al cell line 
strains are not equal, And some are more unequal than others. 


C= of cell linesare frequently described as the workhorses of 
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and impatient? Ifso, you can thank the tobacco industry for 

that bit of self-knowledge, For decades, the cigarette compa: 
fies Philip Morris International and R. J, Reynolds supported and 
promoted studies inking a driven personality to an increased risk 
ofheart disease, The apparent motivation? Ta raise questions over 
‘smoking as a contributor. Subsequent research by scientists without 
funding from the tobacco industry did not link type A personalities 
tohigher rates of disease or death. 

‘Copious case studies document how industry influence can muddy 
research on the health impacts of soda, tobacco, fossil fuels and more, 
but researchers arelargely unaware ofthis. ts time for research funders 
to integrate this information and consider vested interests asa forcein 
the complex research system. 

Sometimes, partnerships and licensing are 
essential, Few drugs, for example, would reach 
ppatients without the pharmaceutical industry 
investing in late-stage trials. And plenty of sci- 
entificinstruments, computing technologies and 
household products have come from industry- 
supported researchers Sul, the scientific commu 
nity must be wary. ln my view, industry influence 
‘ver health research is waxing, not waning, 

In June, Francis Collins, director of the US 
National Institutes of Health (NIH), cancelled a 
'US$100-million study on the effects of moder- 
atedrinking. An advisory committee concluded 
that interactions between government staff and 
representatives ofthe alcobolic beverage industry 
seemed to be intended to bias the study to find 
beneficial health effects, and thatit was not designed to detect potential 
harms such as cancer or heart failure, 

‘The NIH is now working to clarily processes for private-sector collab 
oration, witha plan expected by the end ofthe year. Funders everywhere 
should take thisas motivation to become more ware of how commer: 
cial bias enters the research system — and to earn how to offset it 

‘Some measures to counter bias are in place. Authors are now routinely 
asked to disclose conflicts ofinterest in research papers. And to prevent 
companies from ‘burying’ studies that fal to show patient benefit, many 
countries demand that clinical trials be registered before they start. 

"These measures are inadequate, and not just because they areenforced 
imperfectly. Commercial interests can influence research in ways that 
are largely unmonitored. Industry funding for research on topics such 
as ood, alcohol and tobacco might not directly bias particular studies, 
butinstead encouragea focus on short-term impacts or healthy people, 
showing less harm, Another tactic is to provide ‘unrestricted grants 
to researchers whose previous work has aligned with the interests 
of the industry. In these cases, the researchers may well be acting in 
good faith, but the industry creates an uneven playing field by provi: 
ing disproportionate funding toa subset of researchers. Other tactics 


D: you consider yourself'a type A personality — ambitious 
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Funders must be wary 
of industry alliances 


Granting bodies have to do more to stop corporate money from distorting 
science, says Linda Bauld. 


include possible smokescreens ad distractions. For example, in 2017, 
Philip Morris pledged funds to establish a grant-giving foundation to 
end smoking, raiingallsorts of questions. (The foundation says that, 
‘bylaw itoperates independently ofitsbenefactor) 

Individual funders often have policies or frameworks that are 
intended to curb industry bias, but these vary widely and are con 
fusing. We need a broad discussion about how industry can support 
research without skewing it 

Researchers need to rellect on industry influences at several levels, 
fiom persanal contacts to informational sessions or sponsored research, 
Informal conversations or interactionst conferences might shape aca 
<demsics thinking, as might discussions on funding committees, which 
often include industry members beought in fr their perspective. Spe 

cifes matter: reformulating or developing prod. 
‘uctmsght require industrial collaboration, but the 
snost perl comes when industries ae involved in 
evaluating the impact on ealth ofthat product or 
‘making policy recommendations aboutit 

‘Government and charity funders who advocate 
industry partnerships should develop and per: 
form structured risk analyses, as should univers 
ties and academic researchers who pursue such 
partnerships. As already required with political 
donations, public companies should disclose 
funds given toacademicsin regulatory filingsand 
name those receiving the most money, 

Scientists should get savvier: some ostensibly 
pro-science initiatives have industry involvement 
‘Vested interests then target regulatorsby promot 
ing ostensibly pro-seience policies that have the 
cffectof discounting or excluding unfavourable resulls, Industry also 
provides funds to individuals who make favourable recommendations 
to advisory panels, as documented in the past decade for dozens of 
physicians involved in drug approvals. 

There is some movement in the right dection. The charity Cancer 
Research UK, along with several other bodies that award grants for 
cancer studies, does not fund scientists who are also supported by 
the tobacco industry. The UK Prevention Research Partnership, a 
£50-million ($64-million) funding initiative involving a number 
‘of government and charity bodies, has developed guidelines on 
industry partnerships and has indicated that it wil support research 
into commercial influences on health (It has also funded a £40,000 
consortium-development grant from niyself and colleagues, held by 
the University of Steling, UK.) 

(Questioning the roleof vested interests in research will not be popular 
‘oreasy butitis essential, Governments and academics: step up. 


‘Linda Bauld is professor of health policy atthe University of String, UK, 
‘and deputy director ofthe UK Centre for Tobacco and Alcohol Studies. 
‘e-mail linda bauldastiracuk 
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US space force 
US Vice-President Mike Pence 
gave detalsof the White 
Howe's plane forts proposed 
Space Fore during «9 Aagast 
speech in Washington DC 
Pence sad the Space Free 
would bea military branch 
headed by anew anisant 
secretary of deena or pact 
vritha suggested budget o 
Usssbilion over yeas 
The White House wants to 
establish the Space Fore by 
2020 to take over warfighting 
dltesin space rom the US 
Ahir Force which Pence aid 
‘ould help to counter ant 
Satelite work by China and 
ther counties and would 
strengthen nattnal security 
Thetes has same Support 
Congres, which woulhave 
taapprave the creation f 
theSpace Force. The House 
of Represcaitve pasted 
language eopporting tine 
2o1Bauthorzation bil bat it 
did noteurvve the Senate 


Salk settlement 

The Salk institute for Biological 
Studies has settled two gender- 
dliscrinsination lawsuits led 
last year Cancer researchers 
Katherine Jones and Victoria, 
Lundblad have agreed to drop 
allclaims against the institute, 
located in La Jolla, California, 
according toastatement 
released on 7 August by the 
pairand Salk’ president, 
neuroscientist Rusty Gage 
“spokesperson forthe Salk 
Institute declined to comment 
further on the matter, as did the 
lawyer who represents Jones 
and Lundblad, A thied gender. 
dliscrisination lawsuit against 
the Sak Institut, led last year 
by molecular biologist Beverly 
Emerson, istill pending 
Emerson alleges that systemic 
bias at the institute hampered 
her accessto research funding, 
Iaboratory space and other 


Spacecraft set for the Sun 


(On 12 August, NASA launched its USS1.5-billion Parker 
Solar Probe,a mission that will skim through the Sun's 

Upper atmosphere and taste’ the source of the solar wind. 

‘The spacecraft lifted off from Cape Canaveral, Florida, 

on atrajectory that willtake it past Venus in October, for 

a gravitational nudge to its orbit, and then past the Sun in 
November. Over the next 7 years, it wll ly close tothe Sun 

24 times, coming within 6.2 million kilometres ofthe surface. 
“That wll put the craftin the solar corona, where it will sample 
the magnetic, and other, energies that generate the stream af 
charged particles known as the solar wind, which drives space 
‘weather. It ill be, by far, the closest any spacecraft has ever 
been to the Sun, The mission is named after US astrophysicist 
Eugene Parker, who proposed the existence of the solar wind. 
and who, now aged 91, attended the launch of the satellite, 


resources. In fanuary the Salk \working with the US military 
Institute declined to renew atsecretive detention facilites, 
Emersonis contract; Jones Atthe APASannual meeting, 
and Lundblad are stillon in San Francisco, California, 
stafl A court in San Diego, ‘on 8 August, the organizations 
California, isscheduledtohold governing board voted, 105 
ahearingon Emersoniscaseon 10 57, to reject proposal that, 
17 August ‘would allow psychologists 

to work with the military at 

Ethics vote detention sites that have been 
The American Psychological accused of human-rights 
Association (APA)has upheld violations, even ifonly to 
itsban on psychologists provide care to inmates. The 


[APA firstinstituted this ethics 
rulein 2008, after it emerged 
that psychologists had helped 
todesign torture programmes 
used atthe US detention facility 
in Guantanamo Bay, Cuba,and 
other offshore sites. 


za =a 
Plagiarism rules 


India has for the firsttime 
introduced regulations for 
how universities should detect 
and punish acts of plagiarisin 
Sanctions for researchers or 
students caught breaking the 
rules range from requiring that 
amanuscriptbe withdrawn 
tosackingor expulsion, 
depending on the extent ofthe 
plagiarism. The regulations 
define plagiarismas “taking 
someone elses work or idea 
‘and passing them as one’ 

“and will apply tothe 
867 Universities and affliated. 
institutions that report to the 
‘ations education regulator, the 
University Grants Commission 
(UGO), The UGCannounced 
‘on 3 August thatthe rules 
came nto effect retroactively 
from 23 July. Previously, 
punishments for researchers 
‘who were caught plagiarizing 
were let tothe discretion of 
the institution, The new rules 
also make it mandatory for 
institutions to use plagiarism- 
detection software on students? 
theses and researchers! 
‘manuscripts, Currently, only 
some universities use detection 
sofiware, Several high-profile 
academics in India have been 
accused of plagiarism aver the 
past decade, 


Exam-score scandal 
Officials at Tokyo Medical 
University have confirmed 
that university staff members 
have been lowering the scores 
of female students aking 
itsentrance examination, 

The announcement follows 
anarticlein the Japanese 
newspaper Yonsiuri Shimbun 
that revealed the practice, 
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ig promptingan internal ofinnovation centres over Johnson US$289 million in 

university investigation. The the next 5 years, The funding, damages. Glyphosate has: 

& investigation found that stat Which comesfrom theresearch become controversial amid 

ad been reducing female tbudget and was announced arguments over whetherit 

& applicants’ entry scores by ‘on 10 August, will support iscarcinogenic, In2015, the 

3 20 percentage points, wih the six of ten existing centres World Health Organizations 

= practicebeginning as earlyas known as Catapults, whichare _cancer-research arm said that 

2 2006. An anonymous source designed to bring academics the substance was “probably 

3 athe university, quoted in and industry together carcinogenic’ to humans, on 
the article, said the practice commercialize research thebass of studies inking 
was designed to decrease the ‘Thesixthatsecured funding —_glyphosateexposure to cancer 
‘numberof women accepted, focuson areasincludinggene _inrodentsand humans, but the 
because ofthe belie that 20kilometres above the therapy, semiconductors US Environmental Protection 
manyleave medicine tlook space rockto analttude and renewable energy. The Agency and ather bodies say 
afierchildren,Theuniversty of kilometres, Then, it Catapults governingagency, the producti safe, pointing 
investigation foundthatstaft —_letthe probefiee-fallto Innovate UK,saysthat twill _toother studies that find no 
sembersalso lowered the just 51 metres above the fund the remaining centres connection to cancer. Johnsons 
scores of some men who Surface to measure Ryugus induecourse, Three ofthese _istherirstsuch case against 
had filed the exam muiple gravitational pullandtherefore fourfaciitiesareunder extra_—_—_the company togo to court in 
times. Atapressconferenceon _itsiass.Ryugu, which sabout _scrutinyafteran independent the United States thousands 
TAugest one ftheuniversty’ 1ilometrein diameter, isa Feview in2017 suggested that ofsimilarsuits are pending. 
managing directors, Tetsuo {ypeofasteroidthatiscommon they might not be delivering Inastatement, Monsanto’ 
Yukioka, apologized for intheSolar System, butwhich value for money. vice-president, Scott Partridge, 
the practice and promised has notbeen closely studied said that the “decision does 
toeradicate it. Three days ‘until now. Newly released tot change the fact that more 
later, the education ministry ‘images (pictured) show a < than 800 scientific studies 
launched a survey of the surface strewn withbouldesof Monsanto fined and teviews support ..the 
countrys81 medial schools Various sizes. Hayabusa? was A aliforniajuryhas sided factthat glyphosate does not 
tolookforevidenceofsimilar _—_launchedattheend of2014and with agroundskeeper who cause cancer and that it would 
procedures. arrived at Ryuguin June.Itis__saysthathe developed non. appeal against the decision 

dlueto release seerallanders Hodgkins lymphoma after 
Mite the rock and the probe using the herbicide Roundup, 
‘tse wil also touch dawn to madeby the agricultural giant ‘connECTION 

Asteroid close-up collect samples, which it will ‘Monsanto. In a decision on ‘The Seven Days item ‘Sterile 
‘The japanese mission return to Earth in 2020. 0 August thejurysaidthatthe | mosqutoes’ Nature 558, 
Hayabusa2 made ts closest corporationhad failed w warn | 306-207; 2018) mistakenly 
approach yet tothe asteroid ‘consumers of the potential said that modified male 
Ryugu on 7 August. Mission 5 ‘dangers of the product, mosquitoes were released 
controlatthe Japan Aerospace —-(ANmovation cash ‘which contains the widely into homes during a 
Exploration Agency frst, ‘The UK government will used chemical glyphosate, tal in Austrian fact, 
manoeuvred the spacecraft investateast 1 billion and the judge ordered they were released int 
fiom tshovering postion (USSL3billion)initenetwork Monsanto pay Dewayne neighbourhoods 


é TREND WATCH FISHING FOR TROUBLE 

F Yor legal unreported and uoreguated thing re 

H cnet yo eee chr yeepen  roare 

8 deforestation and legal dubbed lags of convenience? n= Lyssa nM thing 
fishing, according oan analysis states Thesestatesimposefew lie 

§ published on 13 August. repercussions an ship owners, 

Researchers scoured publicly who break international law, a 

4B Up patabeeaarapeges in uctouy 0) Uuaadlicanin providing tie 

O (i iereatipr emigre iain Ie 

5 wereregitered Ofthe _-‘Taxhavens might protect : 

= 209shipsfound tobe nvolvedin unscrupulous fisheries and E 

%  Micitfishing, 70% wereMagged provide cash for environmentally 

under the jurisdiction of ax damaging industries—yetthese 

havens —terrtoriesthatimpose consequences have been ignored 

ij verylowtaxesand providesecrecy by policymakers, says lead Malcves Netherlands tiie. Gok land 

5 [pptest nance cess Soo ae fate ot Man fish ign ends 

© ‘Bycontrasonly4.4% ofthe _scientistat Stockholm University. § : deo 
‘worlds otal fishing vessels were The new paper provides Worst wands Guneaa Arta ane Barbud 
registeredintheseterritories, __“thefirstglobalassessmentto =F. 


‘Most vessels involved in illegal put this issue on the agenda’, 
fishing were registered in Belize Galaz says. 
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Movement of rugees and fighters in North Kiva is complicating the response by aid workers toan Ebola outbreak. 


War zone complicates 
roll-out of Ebola vaccine 


Latest outbreak comes amid fighting in the eastern Democratic Republic of the Congo. 


BY AMY MAXMEN 


idworkersin the Democratic Republic 
A the Congo began giving an experi- 

mental Ebola vaccine to heath work 
ers on 8 August — one week afer the World 
Health Organization declared an outbreak of 
the virus, First responders and public-health 
staff are scrambling to contain the outbreak 
‘while planning how to distribute the vaccine to 
conanaunities in the middle ofa conflict zone 


‘The virus is spreading in North Kivu and 
Itusi,in the east of the Democratic Republic of 
the Congo (DRC). As of 12 August, 57 people 
had shown symptoms of Ebola — including 
41 who had died. But violence perpetuated 
‘by more than 100 armed groups fighting over 
resources in those lush, green provinces has 
‘escalated this year ahead ofa presidental lec: 
tion scheduled for December. Thisis the DRCs 
tenth Ebola outbreak since 1976, butit is the 
first in this tumultuous eastern region. 


says Ibrahima 
Socé-Fal, director of emergency operations 
forthe World Health Organization (WHO) in 
‘Altica, based in Brazzaville in the neighbour- 
ing Republic of Congo. 

Even so, in addition to dispensing the 
vaccine, researchers are preparing to give 
people with Ebola experimental antibody 
treatments and antiviral drugs. 

Socé-Fal says that at east 2,000 doses of the 


“the situation is volatil 


DE Focus 


> remain in the country from the most 
recent Ebola outbreak, which ended in 
July, and more doses are on the way. The 
DRC will need alarger vaccine supply now, 
because the strategy deployed during the 
previous outbreak will not work for the 


During the previous outbreak — which 
lasted three months — officials vaccinated 
health workers, people who had had direct 
contact with sameone with Ebola and the 
contacts of those contacts. But the instabil 
ityin North Kivu and lturihas made track 
ing such connections difficult. In towns 
where people have been infected but off 
cials cant track down their contacts, work 
cers might vaccinate the entire community 
instead, says Socé-Fall. 

An inability to rack these connections 
worries epidemiologists because people 
‘on the move spread the vieus. Humanitar- 
ian groups estimate that this year, nearly 
750,000 people in North Kivu and tusi 
have fled from militia fighters. And about 
‘one million refugees displaced from their 
homes by the violence over at least the 
past decade continue to travel frequently 
between cities in the region, Some refugees 
‘migrate to neighbouring countries such as 
Uganda, Rwanda and Burundi. 

‘Aid agencies must now consider how to 
get into these conflict zones to fight Ebola 
without endangering their staff. Workers 
ight travel with armed security escorts 
provided by the DRC government for their 
protection, said Peter Salama, the WHO's 
head of emergency preparedness and 
response, ata press briefing on 3 August 

Buta key organization fighting Ebola in 
the area, Médecins Sans Frontiéres (MSF 
also known as Doctors Without Borders), 
hesitates to use that approach. The group 
feels that travelling with armed escorts 
hhindersits ability to help people of various 
politcal affiliations, says Salha Isoufou, the 
head of MSF's operation in DRC. So MSF 
will forgo the escorts. 

The next phase of the response by the 
WHO, the DRC government and aid 
groups will be to use experimental drugs 
to teat people who have Ebola. A national 
review board that evaluates research eth. 
ics has approved the use of these treat 
‘ments, and Steve Ahuka, a virologistat the 
‘National Institute for Biomedical Research 
based in Kinshasa, says that some therapeu 
tics have just arrived inthe region. 

One treatment is an antibody called 
mAb1L4, which was manufactured by 
the US government. Others include the 
antiviral drugs Favipiravie (1-705), made 
by Japanese company Toyama Chemical, 
and Remdesivis (GS-3734), produced by 
Gilead, based in Foster City, California, 

Thanks to our experience from the 
previous outbreak, we are prepared. says 
Abuka.a 


Jason Priem (let) and Heather Pivowar co-founded Impactstory, which launched Unpaywallin 2036. 


The rise and rise 
of Unpaywall 


Non-profitis a gift to many academic 


~ and tie-ins with 


established scientific search engines could broaden its reach. 


BY HOLLY ELSE 
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is now at researchers fingertips. These deals = 
are also enabling funders, librarians and oth- 
ers to study open-access publishing trends 
comprehensively for the ist time. 

“Unpaywall isa groundbreaking develop- § 
ment” says Alberto Martin-Martin, who studies 
bibliometries and science communication at the 
University of Grenada in Spain, “Ittakes us one 
step closer to achieving a true open research 
infrastructure 

After participating in the 2011 hackathon, 
Piwowar and Priem founded a non-profit 
organization called Impactstory, in Vancouver, 

Canada, where they 


“Unpaywallisa sefined Unpaywal 
groundbreaking (arrais nowa con 
development. It sultant at the World 
takesusonestep —gank in Asuncidn, 
closer toatrue Paraguay.) 

openresearch ——‘Researchby Prem 
infrastructure.” and Piwowar pub 


lished as a preprint 
in August 2017 — using Unpaywall, natu 
rally — suggests that almost half ofthe recent 
research papers that people search for online 
are available for free (H. Piwowar etal. Pre 
printat Peer) Preprints hatpsl/do.org/ 10.7287) 
pet} preprints.3119v1;2017). But, says Priem, 
‘there isa terrific gap between the availability 
and discoverability” ofthese papers, andi sthis 
problem that Unpaywall hopes to solve, 


10) _ Unpaywallconsists ofa database that includes 
F alist of almost 20 million frely avilable schol- 
2 aaly articles. Most researchers acces it using a 
*E free browser plug-in that was released in 2017, 
34 in june 2017, Unpaywall was integrated into a 
4 popula science search engine called Web of 
345 Science, whichis operated by Clarivate Analyt- 
ics. Dimensions.a service runby Digital Science 
that launched this year, used Unpaywall from 
P the start. These companies, and now Elsevier, 
£ paya subscription fee fora feed of Unpaywalls 
database that is updated weekly. mpactstory 
also offers free accesso the Unpaywall database 
(updated twice a year for non-subscribers) 
Since its launch, Unpaywalls technology 
hasalsobeen integrated into many university: 
library discovery systems, so that users can 
easily find feel available versions of esearch 
papers in institutional repositories, These 
archives, which are operated by universities, 
funders and others, host the ions share of atti- 
clesin Unpaywalls database, but were difficult 
to search systematically in the past 
Scientists using Scopus can filter thei results 
to ind freely avaiable papers, but the database 
linksto only about 1S million papers published 
in fully open-access journals. Once Unpaywalls 
integration is complete in November 2018, 
searches carried out on Scopus for fre-to-read 
literature will also find articles on publisher 
platforms, even ifthe journal publishes a mix 
of open-access and paywalled articles, 
‘This will boost the numberof freely availa. 
ble articlesin Scopus t7 million, which i stil 


* 


TOTALARTICLES* 
ns 


around 13 million articles fewer than ae listed 
in Unpaywalls database (See"Unpaywall evo. 
lution). This gap exists because Scopus will not 
initially Link to articles posted in repositories, 


NEW FRONTIERS 
Large citation databases such as Scopus and 
Web of Science lst the majority ofall research 
articles. By integrating their records with 
Unpaywall data, researchers can systematically 
‘measure the proportion ofthe literature that is 
fieely available — a feat that wastit previously 
possible. The US National Institute of Mental 
Health (NIMH), which has an overall budget 
of around USS1.5 billion, is working with 
Impactstory to develop abespoke tool that uses 
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‘UNPAYWALL REVOLUTION 

Software called Unpayeall sutamatcaly searches the 
Intarnet or ree-te-ead versione of azearch pape. 
Integration wih established selene database, 
lneluding Scopus and Web of Sclence, wil xpand the 
nutber af rely accessible papars ey delver 


Freely available articles 
WeBOFSCIENCE —scoPus 


Talon 


Unpaywall, The agen 
the extent to which researchers working at 
NIMH laboratories in Bethesda, Maryland, 
and nearby Rockville are making their papers, 
data and source code freely available. 

For Priem, making Unpaywall a go-to tool, 
for researchers is just the start. Last month, 
Impactstory secured an $850,000 grant to 
createa search engine aimed at non- scientists, 
Iwill also use artificial intelligence to sum 
‘marize journal articles in its database in plain 
language, so that non-specialists can under- 
stand them. “20 million articles are free for 
everyone to read but might as well be closed if 
there isno way for any average person toaccess 
it hesays. "Were not yet finished” m 


‘DRUG DEVELOPMENT 


Gene-silencing drug approved 


US government okays first RNA - interference drug — after a 20-year wait. 


BY HEIDI LEDFORD 


S regulators have approved the first 
l therapy based on RNA interference 
(RNAi), a technique that can be used 
tosilence specific genes linked to disease. The 
drug, patisiran, targets rare condition that 
can impair heart and nerve function. 

‘The approval announced by the US Food and 
Drug Administration on 10 August, isa land: 
mark for afield that has struggled for nearly 
‘wo decades to prove its worth in the clinic. 
Researchers frst discovered RNAi 20 years ago 
(A Fire el. Nature 391, 806-811; 1998), spark- 
ing hopes ofa revolutionary new approach to 
medicine, Since then, however, a series of set 
backs has lessened those expectations. 

“This approval is key for the RNAV field” 
says James Cardia, head of business develop- 
‘ment at RXi Pharmaceuticals in Marlborough, 
Massachusetts, which is developing RNAi 


treatments. “Thisis transformational” 

Patisiran works by silencing the gene that 
underlies a rare disease called hereditary 
transthyretin amyloidosis. In that illness, 
mutated forms of the protein transthyretin 
accumulate in the body, sometimes impairing 
heartand nerve function. 

‘The drug's approval means that pharmacal 
ogy textbooks will need to be rewritten, says 
Ricardo Titze-de-Almeida, who studies RNAi 
at the University of Brasilia, “We are inaugu- 

ating a new pharmacological group” he says. 
‘We will have many more such drugs in the 
‘coming years” 

‘This was the hope when Alaylam, the 
‘company in Cambridge, Massachusetts, that 
developed patisiran, lauriched in 2002. Four 
yeats later, the Nobel Prize in Physiology or 
Medicine was awarded to two RNAi pioneers: 
Andrew Fite of Stanford University School of 
Medicine in California and Craig Mello of the 
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University of Massachusetts Medical School in 
Worcester 

Butto make RNAi into medicine, developers 
would first need to determine how to deliver 
delicate molecules of RNA safely to their target 
‘organs. They needed a way to shield the RNA. 
from degradation in the bloodstream, pre: 
vent it from being filtered out by the kidneys, 
and allow it to exit blood vessels and spread 
through tissues. “That proved to bea substan: 
tially harder problem than we anticipated” 
says Douglas Fambrough, chief executive of 
Dicerna, an RNAi-facused company in Cam 
bridge, Massachusetts 

As researchers grappled with the delivery 
puzzle, investors began to lose confidence. In 
2008, analyst Edward Tenthoff of investment 
bank Piper Jalray in New York City advised his 
clients to stop buying Alnylam stock. “We saw 
the promise in the technology, butthe delivery 
was lacking,” he says > 
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> By 2010, large pharmaceutical compa. 
nies were also losing their appetite for RNAi, 
severing collaborations and ending inter: 

nal research programmes. “By and large, big, 
pharma left RNAI for dead. says Fambrough, 
Safety concerns dealt the field another blow 
in 2016, when Alnylam abandoned one ofits 
leading RNAi programmesafte finding a pos 

sible nk to patient deaths ina clinical tral (see 
“Upsand downs) 

But gradually, some RNAi companies began 
toiron out the kinks in their delivery systems, 
Alnylam experimented with a number of deliv 
ery routes and target organs, encasing some 
of its RNA molecules in fatty nanoparticles 
or chemically modifying the RNAS to help 
them survive the perilous journey through 
the bloodstream. 

RNAs protected in this way and injected. 
into the bloodstream tended to accumulate in 
the kidneys and liver, This led the company 
to look at transthyretin, which is produced 
mainly in theliver. Ina clinical trial in 225 peo: 
ple with hereditary transthyretin amyloidosis 
Who showed signs of nerve damage, average 
walking speed significantly improved in those 
who received the treatment (D. Adams eta 


UPS AND DOWNS 


The batch rm Alnylam faced several setbacks 
Defoe winning US government approval for 
fist RNAintererance drug, 
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N. Engl. j. Med. 379, 11-21; 2018). Walking 
speed declined inthe placebo group. 

In the future, Alnylam and others will be 
able to move beyond the liver, says company 
co-founder Thomas Tuschl, a biochemist 
at Rockefeller University in New York City 
(Quark Pharmaceuticals of Fremont, California 


istesting RNAi therapies that target proteins in ¢ 


the kidneys and the eye. Alnylam is develop- 3 


ing ways to target the brain and spinal cord, 
and Arrowhead Pharmaceuticals of Pasadena, § 
California, is working on an inhalable RNAi ® 
treatment for cystic fibrosis, 

“Pve never been more optimistic about the 
future of RNAi? says Fambrough. "All of those 
tear-your-hair-out days were worth it to gett 
today” 

‘Advances in RNA delivery might also benefit 
researchers who are developing gene-editing 
therapies based on the popular technique 
‘CRISPR-Cas9. Tha system usesa DNA-cutting 
protein called Cas9, which is guided to the 
desired sitein the genome by an RNA molecule, 

Like RNAi before it, CRISPR-Cas9 has 
become a common tool in genetics labora. 
tories. Butit might still face a difficult and 
lengthy path tothe clinic, Much like ordinary 
drugs, RNAi therapies will break down over 
time; a gene edit, however is intended to be 
permanent, which amplifies safety concerns. 

“hope they can do it more quickly than 
we did it, but I would not expect it to be so 
smooth,’ says Fambrough. “I wish them the 
best ofluch 


Outrage over changes to 
EPA chemical assessments 


Critics say US environment agency’s revisions favour industry over academic research. 


BY JEFF TOLLEFSON 


he US Environmental Protection 
Agency is making major changes to 
the way in which it evaluates chemicals 
{or environmental and public-health effects, 
‘The latest push includes changes to chemical 
safety guidelines that place greater weight on 
industry-sponsored research, among other 
things, and isa part of efforts by US President 
Donald Trump’ administration to reshape 
hhow the agency uses science to make decisions, 
The Environmental Protection Agency 
(EPA) issued ts chemical-assessment guidance 
in May, andi soliciting public comments until 
16 August, The guidance contains changes 
dictating the kind of data that studies must 
include in order to be considered in the EPAS 
decision-making process, Researchers and 
environmental and public-health advocates 
say that the guidelines provide a non-peer- 
reviewed alternative to the EPA main system 
for conducting chemical reviews and calculat: 
ing acceptable exposure limits. ‘The agency is 


required by law to do these evaluations, but the 
guidance defines how officials conduct them. 
At stake are tens of thousands of chemicals 
destined for public use and governed by the 
1976 Toxic Substances Control Act (TSCA). 

‘The guidance dovetails with arule proposed 
in April by then-EPA administrator Scott 
Pruitt, which, iffinalized and implemented, 
would reduce the role of published scientific 
studies in decision-making across the agency. 
The changes also coincide with attacks on the 
EPA’ core chemical-assessment programme, 
known as the Integrated Risk Information 
System (IRIS), by industry and Republican 
politicians over the past year 

Ina statement to Nature, the EPA says the 
changes are meant to provide clear criteria to 
help determine the quality othe esearch used 
tw evaluate chemicals — and that the guid- 
ance isa work in progress that can be revised 
in response to new information. But scientists 
say the process laid out by the EPA is at odds 
with established, peer-reviewed procedures for 
such assessments, 


Jennifer Sass a senior scientist at the Natural 
Resources Defense Council, an advocacy 
group based in New York City, suspects that 
the goals are to promote science from industry 
and change the calculations that the EPA uses 
todevelop regulations and estimate sale expo- 
sure lists for chemicals 


[MEETING THE REQUIREMENTS 
The guidelines introduce many data report. 
ing requirements — including statistical 
analyses that measure whether a study cor: 
rectly identifies the presence of an effect 
— that are standard for industry-funded 
research. But because such criteria vary 
among peer-reviewed journals, many aca 
demic studies would be disqualified, says 
Tracey Woodruff, who led the development, 
‘ofa chemical-evaluation process at the Uni 
versity of California, San Francisco. “Only 
industry studies will survive! 

TThe changes represent a major shift because 
they create a new system for chemical 
risk assessments under TSCA. Unlike 


4 IRIS, the process introduced by the Trump 
administration has not been peer reviewed, 
E and yet it would allow agency officials to 
4 circumvent IRIS evaluations, Under former 
{president Barack Obama, the EPA would have 
| used IRIS to perform these reviews when 
considering regulations under TSCA. 

The IRIS programme dates back to 1985, 
but under the Obama administration, the EPA 
‘modernized and standardized its chemical: 
evaluation procedures to improve transpar 
ency and confidencein its health assessments 
‘Woodruff says that the IRIS process is solid 
and that bypassing it would be a mistake 

“The TSCA office is deciding to ditch all of 
the experts and empirical methods that have 
been developed over the last 30 years for a 
method that appears tobe based on their whim 
and personal opinion,” she says. 

But the EPA insists that the review process 
used in these chemical evaluations is intended 
to “comprehensively capture all available 

Politicians in the US House of Representa- 
tives have aso hammered IRIS, holding hear 
ings questioning the quality and validity of 
the programme’ assessments. The political 
‘manoeuvring parallelsefforts from industry to 
bypass scientific reviews of certain chemicals. 

(ne plant in LaPlace, Louisiana, makes the 
chemical chloroprene for the Tokyo-based 
company Denka. Chloropreneis used to make 
neoprene, asynthetic rubber integral to prod- 
ucts such as wetsuits. A2010 IRIS evaluation 
and subsequent government studies suggested 
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‘Achemical produced ina US-based plants part ofa challenge toa government safety programme. 


that chloroprene exposure levels in LaPlace 
‘were high enough to increase cancer risk in 
some areas of the city. Denka challenged that 
ruling last year, arguing that the assessment 
‘was incorrect. The company lost its challenge 
in January but has since appealed against that 
ruling. A panel appointed by the EPA leader: 
ship will now consider the appeal 

‘Denka has argued to its politica allies that 
reducing chloroprene emissions would be too 
expensive, says Karl Brooks, a former EPA. 
official who last year served as a consultant 


ina lawsuit filed by LaPlace residents against 
Denka. That’ a potentially dangerous develop 
ment, he says, because IRIS assessments are 
meant to focus on the health effects of chemi 
cals —not the economic challenges thata com 
pany might face asa result ofthe core science, 

Researchers fear that the chloraprene case 
represents yet another strategy for companies 
seeking relief from the burdens of regulations: 
challenge the science and, when that fails, 
appeal to friendly politicians and political 
appointees. 


"ARTICLE PHYSICS 


LHC teams turn to 
brute-force hunt 


World's most-powerful particle collider is using afresh 
approach to find evidence of ‘new’ physics. 


BY DAVIDE CASTELVECCHI 


A essctnotictit 
fle ie Large Haron Caldr 
(LHC) Tees apr ATLAS egerinet 
hati how ts weg bend te 
rtd el ahereane ey nh raph 
teteansofdatsretey eran 
iecidre hope fr decng eau 
dhagonbejonthe Sandal utile 
pies bce enetioa es ae 
snlarcome wpenpy unded 

So rat sts atthe LHC — at 
CERN Europe! pte nyse abriory 


neat Geneva, Switzerland — haveinvolved tar 
‘geted searches for signatures of favoured theo- 

ries. The ATLAS collaboration now describes its 
fistall-out general’ search ofthe detectors data 
—akind of brute-force approach — ina pre 

print postedlast month and submitted to Euro- 
pean Physics ournal C(ATLAS Collaboration, 
Preprint at htps:/arxivorg/abs/1807.0744701: 

2018). Another major LHC experiment, CMS, 
is working ona similar project. 

“My goalisto try tocome up witha relly new 
way to look for new physics” — one driven by 
the data rather than theory, says Sascha Caron 
‘of Radboud University Nijmegen in the Neth- 
cerlands, who has ed the push for the approach, 


at ATLAS, General searches are to the targeted 
‘ones what spell-checking, 

searching for a particular word. The 
searches could realize their full potential soon, 
‘when combined with increasingly sophisticated 
artficial-intelligence (AI) methods. 

LHC researchers hope that the methods 
Will ead them to their next big discovery — 
something that hasnit happened since the detec 
tion ofthe Higgs boson in 2012, which put in 
place the final piece ofthe standard model. The 
model describes all known subatomic particles, 
bbut physicists suspect that there is more to the 
story — the theory doesn't account for dark 
matter, for instance. But big experiments such 
asthe LHChave yet to find evidence for this 
behaviour. That meansit'simportant to try new 
things, including general searches, says Gian, 
Giudice, who heads CERN's theory department 
and is not involved in any of the experiments, 
“This isthe right approach, at this point” 


COLLISION COURSE 
The LHC smashes together millions of protons 
per second at colossal energies to produce pro- 
fusion of decay particles, which are recorded by 
detectors such as ATLAS and CMS, Many dif 
ferent types of particle interaction can produce 
the same debris, For example, the decay of 
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‘The ATLAS detector atthe Large Hadron Cll 


> a Higgs might produce a pair of photons, 
but so do other processes. So, to search for the 
Higgs, physicists first ran simulations to predict 
how many of those ‘impostor’ pairs to expect, 
‘They then counted all photon pairs recorded 
in the detector and compared them to their 
simulations, The difference —aslightexcess of 
photon pairs within a narrow range of energies 
— was evidence thatthe Higgs existed. 


lornear Geneva, Switzerland. 


ATLAS and CMShave run hundreds ofthese 
targeted searches to look for particles notin the 
standard model, but the searches have come up 
empty so far. This leaves open the possibility 
that there are exotic particles that produce s 
natures no onehas thought of — something that 
general searches haveabetter chance of finding, 
‘Whereas targeted searches typically look 
atonly a handful of the many types of decay 


product, the latest study looked at more than y 
700 types at once. The study analysed data 3 
collected in 2015, the first year after an LHC 
upgrade raised the energy of proton collisions 
in the collider from 8 teraelectronvolts (TeV) 
to 13 TRV. ALCMS, Meyer anda fe collabora 
torshave conducted proof-of principle study, 
Whicl hast been published, on a smaller set of 
data from the 8 TeV run, Neither experiment 2 
has found significant deviations so far. This £ 
was not surprising, the teams say, because the 
datasets were relativelysmall. Both ATLASand 
CMSare now searching a larger trove of data 

‘The approach has clear advantages, butalso 
clea shortcomings says Markus Klute, phys 
cist atthe Massachusetts Institute of Technol: 
‘ogy in Cambridge who is part of CMS and has 
worked on general searches for previousexperi- 
‘ments. One imitation i statistical power. Ifa 
targeted search finds positive result, there are 
standard proceduses for calculating its signi 
‘ance; when casting a wide net however, some 
false positives are bound to arse, one reason 
that general searches had not been favoured in 
the past But the teams say they have putalot 
of work nto making their methods more solid 

Proponents ofthis approach hope to use 
‘machine learning to find patterns in the data 
Without any theoretical bias, “We want to 
reverse the strategy —let the data tell us where 
tolook next” Caron says. 


Model 
citizens 


HOW DIGITAL DRUG DEALERS AND 
VIRTUAL USERS ARE PROVIDING CLUES 
TO HELP STOP THE US OPIOID EPIDEMIC. 


By Sara Reardon 


ith the tip of her syringe, Brandi pokes ata grey lump of her 
‘in in a spoon. fa new variety ofthe drug that has shown 
up on the market in the past few days, and Brandilikes it. “1 


feel this more, | feel more ofthe pain resistance” she says. 

‘Once ithas dissolved into a liquid, she injects itinto her arm, then 
uses. fresh needle to inject the skinny arm of another woman. “She does 
atetter than the hospital” the woman comments, 

Til help anybody who needs it? Brandi explains to public-health 
researcher Daniel Ciccarone of the University of California, San Fran 
cisco, who has been filming the entire process. 

CCiccarone’s team has embedded with Brandi — whose namehas been 
in Chaeleston, West Virginia, documenting 
\dgement or interference, Later, the group 
will analyse this video, in addition to half dozen other videos of drug 

ng details big and small, Brandi does not 


changed for this story 
her interactions witho 


‘users from across the city log 
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Philadelphia in 
Pennsylvania 
recorded more than 
1,200 overdose 
deaths in 2017. 


beat the solution on the spoon, for instance, and that may increase the 
likelihood of spreading viruses such as HIV. And tests reveal that what 3 
she’ taking has been laced with fentanyl, a synthetic drug up to 50 times 
‘more powerful than heroin, 

The researchers will plug these data into powerful computer simula 
tions of Charleston, populated by thousands of virtual Brandis — heroin 
users and dealers going about their daily routines. They wil watch these 


agents buy more heroin as their tolerance increases, form networks 
with sellers and users and, in some cases, accidentally overdose 

I groups using agent-based models to under 
stand what is driving the US opioid epidemic — the dramatic rise over 
the past two decades in the use of opioids, including prescription pain 
‘medications and illegal drugs such as heroin. By studying the motiva 
tionsand practices of real dng dealers and users, the researchers hope to 
build agents whose behaviour in the virtual woeld mimics that in ral 


Ciccarone' is one of se 


nt-based models promise to provide a more granular view ofthe 

.epopu 
lations, and to capture some of the complexity of the driving forces. 
This could prove important for demonstrating the effects of opening 
1g methadone clinics or needle exchanges. The models allow 


opioid crisis than standard modelling, which i based on aver: 


scientists to compare interventions at almost no cost and could help 
policymakers to decide how to proceed in the real world. “Its a very 
classic and useful way to try and see where isthe best place to deploy 
an intervention to have the biggest effect” says John Brooks, a medical 
adviser for the division of HIV/AIDS prevention at the US Centers for 
Disease Control and Prevention (CDC) in Atlanta, Georgia, 

Although such simulations have long been used to model disease 
outbreaks and have, in some instances, guided public policy, their track 
record with more complex social behaviour such as drug use is limited, 
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largely owing to sparse data and the breadth of parameters to consider. 

Still, scientists hope that agent-based models can lay out scenarios for 
decision-makers, who are often driven more by polities than data. "The 
barriers are not scientific or medical,” Ciccarone says. “You can throw 
S1 billion at West Virginia, and they may or may not know how to use it 
\well” These virtual worlds can add cavity, says Joshua Epstein, director of 
the Agent-Based Modeling Lab at New York University “You can literally 
\wateh the thing unfold before your eyes” he says. 


‘THE DIFFERENCE IN THE DETAILS 
‘The US opioid crisis is estimated to kil 115 people per day through 
overdoses, and has run up USSI trillion in health-care costs and lost 
productivity since 2001. Itis not the fist addiction crisis thatthe United 
Stateshas faced, norisit the most severe. Alcohol use causes many more 
deaths, and the rate of cocaine overdose among African Americans is 
similar to the rate of opioid overdose in white Americans’. 

But the opioid crisis does have some different driving factors, includ: 
ing the prevalence of prescription drugs, which many have used on the 
‘way to abusing illegal drugs, and the introduction of fentanyl, which 
is often used to boost the potency of heroin and is responsible for a 
large share of overdose deaths. The epidemic has also hit haed in rural 
settings, where services and infrastructure for dealing with addiction 
are scarce. “The demographic now encompasses a population that in 
the past has not been so affected” says Nora Volkov, director of the US 
‘National Institute on Drug Abuse in Bethesda, Maryland. 

‘Asa result, researchers are coming up with fresh ways of thinking 
about the crisis. Itbears similarities toa disease epidemic, for example, 
in the way itspreads through networks based on personal relationships 
and physical proximity, says Georgiy Bobashey, a data scientist at the 
‘non-profit research institute RTI International in Research Triangle 
ark, North Carolina. “Nobody is born an addict. Somebody has to 
teach you how to smoke or how to inject” 

"These personal networks can be replicated using agent-based model 
ling, Unlike other types of model, which may rely on average characteris: 

sor relationships between homogeneous groups o inform algorithms, 
agent-based models allow researchers to see subtle connections between 
people. "That's useful because drug use and 
‘overdose is inherently personal” says epide- 
siologist Brandon Marshall of Brown Uni 
versity in Providence, Rhode Island, Factors 
suchas jobloss, mental health or genetics can 
influence how likely person is toegin using 
drugs or become addicted, but those factors 
‘ight fade into the averages if researchers 
looked ata population asa whole, 

“Tocreatean agent-based mode, researchers 
fest ‘build’a virtual town or region, sometimes 
based on a real place, including buildings such 
asschools and food shops, They then populate 
it with agents, using census data to give each 
‘one its own characteristics, such as age, race 
and income, and to distribute the agents throughout the virtual town. 

"The agents re autonomous but operate within pre-programmed rou- 
tines — going to work five times a week, for instance. Some behaviours 
may be more random, such asa 5% chance per day of skipping work, or 
250% chance of meeting a certain person in the agent’s network. Once 
the system sas realistic as possible, the researchers introduce variable 
such as flu virus, with arate and pattern of spread based on its real 
life characteristics. They then run the simulation to test how the agents! 
behaviour shifts when a school is closed ora vaccination campaign is 
started, repeating itthousands of times to determine the likelihood of 
different outcomes 

12015, data from an agent-based model developed atthe University 
of Pitsburgh in Pennsylvania helped California state senator Richard 
Pan to gain support fora bill on mandatory vaccination in his tate. 
Pan used the simulation to demonstrate to his fellow senators how 
‘measles outbreaks could unfold in their home districts, “It certainly 


“We're trying to 
enter their world 
as interlopers to 
see how they see 
their life.” 


‘made an impact on them” Pan says. “Instead of just describing it in 
‘more abstract terms, [the model] can make it more conerete” The bill 
ultimately passed, and insmunization rates increased 

As computers have improved, researchershave begun adapting agent- 
based modelsto lookat sociological and behavioural tends that require 
:more computing power because of the numberof variables they contain 
Some groups use the technique fr crsis modelling, and Australia has 
‘begun intervention studies fr child obesity on the basis ofthe findings 
ofan agent-based model. 

Inresponse othe opioid epidemic, Bobashev’s geoup has constructed 
Pain Town — a generic city complete with 10,000 people suffering from 
chronic pain, 70 drug dealers, 30 doctors, 10 emergency zooms and 
10pharmacies The researchers run the model over five simulated years, 
recording how the situation changes each virtual day 

‘During this time, the patients’ drug tolerance inereases, leading them 
to find different ways of acquiving drugs. Their behaviour is driven by 
‘variables such asthe chance thata doctor willincreas ther prescription, 
orthelikeihoodthata dealer wilhave enough heroin. Atacertain thresh- 
‘ld, patients ecome addicted or moe key to overdose, Bobashev'sealy 
data suggest for example, that requiring doctors to tracepatents' med 
‘ation history canbe effective over the long term, but not immediately 

“The model contains many assumptions and simplifications, Bobashev 
says. For example itdoessit capture the fact thatthe rte at which people 
develop tolerance and addiction can depend on factors suchas genetics, 
and that whether aperson switches from prescription drugs to heroin 
can depend on the relative availablity of the two drugs. 

‘But researchers can adjust models such as Pain Town to test various 
interventions, suchas increasing access to emergency rooms, aresting 
dealer or equipping police with naloxone (a drug that reverses opioid 
overdoses) to see how the system reacts and whether it affects thenum- 
berof deaths overtime. And as models become more sophisticated, the 
researchers may be ableto incorporate more factors, such as people who 
are not taking pain medications but are susceptible to tying opioids 
forthe first time 

‘Models can also be useful for understanding why individual places 
or situations may differ, says Christopher Barrett, a computer scientist 

at Virginia Tech in Blacksburg. For instance, 
heroin and fentanyl might be easier to come 
by in cities near ports, whereas doctors may 
be the main source of opioids ina suburban 
or rural setting. Interventions focused on 
prescribing practices, therefore, would have 
diferent eects in each case 

Such models can also reveal feedback 
loops, such as the link between economic 
downturns and opioid use. Some epidesio 
logical studies have suggested that factors 
such as unemployment tend to predict su: 
cideand addiction, especially in white male 
populations. And addiction ca lead to fu. 
ther jab loss and lower productivity harming 
the economy: Agent-based models could investigate loops suchas this, 
providing ideas for how to mitigate the effects, Barret says 

In May, Bobashev and Ciccarone presented results from one of thei 
agent-based models ata meeting ofthe International Society for the 
Study of Drug Policy in Vancouver, Canada. Their findings suggested 
thatthe increased prevalence of white-powder heroin —a newer form 
ofthe drugin the United States — may increas the tsk of HIV spread 
ing among injection drug users. The reason, also supported by the 
‘mode, is that unlike black-tar heroin, users dont need to heat the drug 
tw dissolve it — and heating kills the vtus. 

Bobashev and Ciccarone are working on models of how younger 
heroin usets begin using the drug. Unlike older users, who experienced 
therise ofthe HIV epidemicin the 1980s, newer users may be les likely 
twadopt safe practices. The models suggest that the United States may 
sce more localized HIV outbreaks, similar to the recent outbreak in 
Scott County, Indiana. That region experienced 181 new HIV cases 
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between November 2014 and November 2015, compared with fewer 
than 5 cases per year previously, Opioid use is thought tobe the cause, 
Agent-based models might help to stem future outbreaks by guiding 
surveillance priorities, 

‘One ofthe most sophisticated agent-based models isthe University of 
Pitisburghs system, known as FRED (A Framework for Reconstructing 
Epidemiological Dynamics), It fits population census data to maps of 
geographical regions around the country allowing researchers to track 
Virtual individuals in the area ina realistic way. It was data from these 
‘models that helped to convince Pan and his fellow state senators to pass 
legislation on mandatory vaccination. The FRED team isnow begin 

raining iton historical tren¢ 


to use the system for opioid modell 
an, who is also a physician, says he is intrigued by the prospect. “If 
theres a way to actually model in different communities which factors 
would have the biggest impact, that would be helpful” he says 


DATADROUGHT 
The models face numerous challenges before they will be ready for 
\widespread adoption, primarily data gaps. Marshall says that research 

le to get access to data on opioid prescriptions that are held 
by manufacturers, pharmacies and law-enforcement agencies. It isalso 
difficult to obtain government information on drug cartels and the type 
and rate of drugs flowing into the country. Other data simply do not 
exist in usable form: agencies may record deaths due to drug overdose, 
for instance, but fail to specify which drug was responsible 

‘Observing drug users such as Brandi can provide certain types of 
information more quickly and accurately. “Drug users know their 
chemicals intimately” Ciccarone says. 

Lee Hoffer isa cultural anthropologist at Case Western Reserve Un 
versity in Cleveland, Ohio, who studiesheroin marketsand collaborates 
with Bobashev. He says the ethnographic data that his group and others 
are collecting could help to fill some ofthe information gaps: "Were 
trying to enter their world as interlopers to see how they see their life” 
After an initial awkward period, he says, drug users tend to become 
‘more honest with the researchers telling thers crucial information such 
ashow they form networks with dealers and the cost of drugs. 

Understanding the psychology of drug users is also crucial, says 
Epstein, Most decision-making models assume rational behaviours, 


ns strug 


Youngstown in Ohio 
has high rates of 
unemployment and 
opioid addiction. 


{In reality, emotions, misinformation and irrational calculations play a 
‘major part "When you put them together you get collections of dynarn 
Jes that are very dysfunctional’ 

Epidemiological data may soon be available to buttress the models. 
The CDC and the US National Institute on Drug Abuse have started. 
several major surveys of drug-use patterns. A numberof states have also 
begun collecting epidemiological information on trends in overdose 
andaddiction. And research groups such as the University of Pittsburgh 
team are working with multiple health agencies to collate their findin, 
ina single database, which can inform FRED and other models, 

But no matter how advanced the madels become, implementing inter 


ventions based on ther findings is an enormous challenge. Models may 
reveal socioeconomic contributors that cannot be easly addressed by pol 
cies, and polities can stand in the way of proven solutions. In April, Cicea 
one hadto cancel is work in Charleston atleast forthe time being after 
aaneedle-exchange clinic with which he had been collaborating dosed 
‘wing to political pressure. “They were seeing 300 people on a Wednesday 
afternoon because theresa lot of need” he says “Its a huge loss” 
Increasing work is being done to determine the relative impact of 
interventions, The US National Institutes of Health (NIH) in April 
announced $96 million for a progranme that wll partner with health 
care systems and local governments to carry out evidence-based public 


health interventions in different locations, evaluating them as they go. 

This is the first time this [has been] done fora particular substance 
abuse disorder” says Volkow. The NIH is now asking researchers who 
want to apply for these funds to justly the size and scope of theit pro 
posed studies with data from models, including agent-based models. 

But these studies will take many years to complete. And Bobashev 
says that society cannot afford to wait forthe science to he perfect. “By 
the time this data i collected, tens of thousands, ifnot hundreds of 
thousands, more deaths will have occurred” @ 


Sara Reardon writes for Nature from Washington DC 
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Fish and other marine lifeare affected by ocean weather: drastic variations in temperature, pH, oxygen and salinity that ae in tur influenced by climate change. 


Biologists ignore ocean 
weather at their peril 


Ecologists must understand how marine life responds to changing local conditions, 
rather than to overall global temperature rise, say Amanda E. Bates and 16 colleagues. 


he ocean can turn on a dime 
Temperature, pH, oxygen levels and 


salinity can vary drastically — across 


distances of centimetres and within time 
frames of minutes". That’ the latest view 
being revealed by measurements from thou 
sands of instruments anchored to shores or 
attached to floats, ocean gliders and ships. 
‘Yet many people think of oceans as a 


relatively constant environment. That idea 
might have been hatched when researchers 
on the HMS Challenger expedition of 

76 tracked water temperature and cur 
rents and lowered weights to gauge depth at 
thousands of sites across the world’s seas" 
The global picture that emerged after averag. 
ing these data was one of stability in which 
any variability had been ost. Certainly, that 


picture was reinforced by twentieth-century 
images of Earth from space, showing the 
world’ ocean as a uniform deep blue’ 

Most biologists and ecologists trying to 
understand how ocean biodiversity isaffected 
by climate change focus on large-scale aver 
ages across space and time. They try to 
predict, for instance, howa mean global tem: 
perature rise of 2°C: could affect marine 


> life such as bacteria, phytoplankton, fish 
and other creatures. To do ths, they use ro 
jected changes in the mean temperature of 
the gcean, These arebased on estimates from 
satellite, which measure the temperature of 
only the top few milimetres ofseawater 

Bur organisms experience and respond to 
local shifts in ocean weather that occur over 
‘weeks, hours and minutes, ather than to 
changes in climate per se, which unfold over 
years and decades (although long-term cl- 
‘mate changes drive the short-term shifts). A 
handful of studies that attempt to investigate 
hhow local physical conditions affect species 
(including the numbers of individuals and 
‘ypesofspecies occurring) are beginning to 
show the value ofa more detailed approach’ 

We call on ecologists to rethink their 
models and experiments, This would enable 
them to stat linking changes in biodiver- 
sity to changes in conditions atthe scales of 
space and time thatare relevant to individual 
organisms 


OCEAN WEATHER, 
“Taget the most detailed picture of conditions 
across the ocean's surface and at depth, 
physical scientists are starting to combine 
high-resolution in situ measurements of 
temperatuee, salinity and so on with satel- 
lite data, Remote-sensing and continuous 
‘monitoring ate revealing a highly dynamic 
environment, even in the open and deep 
oceans (see" Watched waters) 

For instance, ctcular currents, or eddies, 
occur throughout the ocean, Depending on 
‘whether they rotate in the same direction 
as Earth or counter to it, they can provide 
conditions that are rich or poor in nutei- 
ents — different habitats fr diferent phyto- 
plankton and other organisms’. 

“The currents arising from eddies extend 
dows 4,000-6,000 metres to the abyssal 
‘ocean, as ‘benthic storms. These resuspend 
sea-floor sediment, creating nutrient-rich 


regions at depth Likewise tides, storms and 
strong currents affect mixing and change 
buoyancy throughout the water column, 
across scales ranging from centimetres toa 
few metres. This ets the stage for consider- 
able variation in the amount af photosynthe- 
sizinglife through space and time, And that 
allecis entire food webs. 

[Neater to shore, variability is even more 
dramatic. The temperature can hiftby more 
than 10°C in one tidal cyele or asthe wind 
displaces surface water and cold water wells 
up from below (upwelling). Oxygen levels 
can swing from 0% to 100%, and pH can 
shift by more than one unit as microbes 
use up oxygen and as phytoplankton and 
plants generate it. Microsensors placed near 
brganisms such as mussels have revealed that 
‘oxygen, pH and carbon levels can be highly 
variable, even on small scales of less than 
{ millimetre. Extremes ofthese variables far 
exceed the projections made by the Intergov- 
ernmental Panel on Climate Change under 
various scenarios fora warming planet 


STORM FORCE 
‘Why do ecologists generally ignore such 
‘ocean weather as a force that shapes 
biodiversity? 

In our experience, there is a mispercep- 
tion among researchers that highly localized, 
rapid changes are irrelevant to understand: 
ing or predicting biological changes in 
marine systems, Spikes in temperature 
and other variables over hours or mi 
utes are often dismissed as being ‘extreme 
"A greater barrier, however, s 


obtaining the relevant data ina format that 
isaccessible to biologists. 

Satellite measurements of global tem- 
perature have been collected by space 
agencies such as NASA since the 1980s. 
‘And today, data on temperature trends, 
rainfall, cloud cover and other climate phe- 
‘nomena can be downloaded easily. They are 


struments ave revaling tremendous 
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Fever deployments ater 
the fall ofthe Soviet 
Urian, a major payer in 
‘cea research, 


available at gridded scales of tens tohundreds 
oflilometzes, and often at yearly or monthly 
resolutions (see, for example, https//data, 

nasa.gov). Averaged ocean-relevant data 
tuilored for ecological questions are also avail- 
able from initiatives such as Bio-ORACLE, 
un by a team of marine researchers in Bel: 

sium, Portugal and Australia. However, the 
data generated by high-resolution ocean 
‘monitoring are much harder to acces, 

‘Such monitoring tends to be geographi 
cally limited, with the most intensive surveys 
‘occurring in waters where nations have 
economic interests and access. Even when 
the data have been collected, many ecolo: 
gists do not have the computational kills 
or infrastructure to store and manipulate 
them. When one of us (A.E.B.) recently 
equested data from a national oceanogra 
‘phy institut, for instance, an oceanographer 
provided a link to many folders, Each folder 
contained hundreds of files of temperature 
and other data collected from diferent time 
periods — too vast a resource to download 
‘on astandard computer. 


‘TORN THE TIDE 
This neglect of ocean weather in theory, 
experimental design and modelling is 
‘hampering progress in at east three ways. 


Predictions are wrong. When ecologists try 
to forecast change by running experiments 
oF using macroscale, simulation-based 
models, physical parameters are generally 
represented by averages, Such efforts can 
generate either overly catastrophic projec 
tons or excessively optimistic ones. 
Ecologists generally agree, for example, 
that marine species inthe tropics and poles 
will be more vulnerable to the effects of a 
rise in temperature of 2°C. Tropical species 
are already living in the warmest habitats 
on the planet", whereas those atthe poles 
have nowhere else to go". But oceans are not 


bearing 
technologies and 
programmes rove 
thus 


<= 


‘AWediell eal quipped with a sensor for measuring ocean conductivity, temperature and depth. 


warming evenly across the tropics and poles. 
Some areas ae even cooling” 


Heterogeneity is overlooked. Overall, Earth 
is losing species. Yet there are huge differ 
ences in the rate of loss atthe local scale; 

is even increasing in some 
places'*, Certain species, populations and 
individuals can adaptand adjust, and this 
will ead to surprises. In 2017, hundreds of 
surveys on the Great Barrier Reef in Aus- 
tralia beforeand aftera mass bleaching event 
revealed huge variability in how fish species 
responded to the extreme heat. Some trophic 
groups, such asherbivores that scrape algae, 
became less common on the warmest reefs 
For others, such as those that feed on plank 
ton, warmer temperatures seemed to benefit 
populations" 


Opportunities are missed. Ignoring the 
variability in ocean systems could limit 
conservation and management strategies!" 
For instance, the concept of climate relugia 
Where species can shelter from the effects 
of climate change, has been considered for 
cooler terrestrial landscapes such as moun: 
tain valleys and rivers. Yet marine spatial 
planning tends to overlook the possibilty of 
fefuge sites arising, say, from the upwelling 
of cooler waters from depth, or from the 
shade provided by coral eet. "Thisislargely 
because ecologistslack the fine-scale data to 
establish where potential refugia exist 


‘THE NEXT WAVE 
Each stride forward in the physical sciences 
should translate to improvements in ecolo: 
gists’ predictions of biodiversity change. 
‘Major advances in how atmospheric and 
climate scientists understand ocean pro: 
cesses are rapidly unfolding as a result 
of improvements in ocean-monitoring 


technologies, as well asin climate models" 

‘Making equivalent progress in the life 
sciences — in tandem — will quite atleast 
three changes. 


Acceptance. Ecologists must embrace the 
fact that the oceans are variable, and con: 
sider more carefully the limitations and 
biases inherent to physical data. Ocean, 
surface temperatures measured by satellites, 
‘for example, shed litle light on conditions 
{or organisms that live at depth, 

In practical terms, this means incorporat 
ing variability into ecological models and 
‘experiments. This is starting to happen for 
terrestrial ecosystems. In 2016, forinstance, 
researchers revealed that daily fluctuations 
in temperature are just as powerful a pre 
dictor of changes in the geographical range 
of frogs, lizards and other organisms as, 
seasonal variation” 


High-performance computing. Ecologists 
urgently need ways to access and analyse 
high-resolution data on environmental 
variability. They are used to dealing with 
‘megabytes of data, but they need to be able 
tohandle terabytes 

Currently, there are various options for 
accessing high-performance computing. 
Researchers can apply for cloud-computing 
‘grants offered by Microsoft and Google. And 
some countries, such as Canada, offer cloud 
resources and training to enable academic 
institutions to embrace big- data research, The 
provision ofthis type of infrastructure and. 
‘support should be prioritized more broadly 


Crosstalk and collaboration, Much more 
dialogueis needed between ecologists, physi 
ologists and climate and ocean scientists to 
aid understanding of what dataare required, 
‘and in what formats, For instance, Div — the 
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‘Synthesis Centre of Di the German Centre 
for Integrative Biodiversity Research in 
Leipaig — runs workshops to faster cross 
talk between researchers and kiele-start new 
approaches. This and hundreds of simi 
lar efforts could help to bring the relevant 
researchers together. Dedicated funding for 
working groups, and forinterdsciplinary sci: 
cencein general, will be key. 

Only through global collaboration will 
‘ecologists be able to obtain a global perspec: 
tiveon ocean weather. Therearealready some 
‘good models for this, The Global Ocean Acid 
ification Observing Network (GOA-ON), for 
instance san international effort to provide 
highly resolved biogeochemical data on 
the scale of metres, to enable researchers to 
‘optimize models of acean acidification, 

‘We predict that when biologists engage 
with the physical and biogeochemical data 
now becoming available — at sales matched 
to those of organisins lives — major shifts 
will occur in how we conceptualize and 
‘manage biodiversity change in the ocean. 


‘Amanda E, Bates is associate professor of 
‘ocean sciences atthe Memorial University 
‘of Newfoundland, St John's, Canada, 

‘and a Canada Research Chair in marine 
physiological ecology. Brian Helmuth, 
‘Michael T. Burrows, Murray I. Duncan, 
Joaquim Garrabou, Tamar Guy-Haim, 
Fernando Lima, Ana M. Queiros, 

Rui Seabra, Robert Marsh, Jonathan 
Belmaker, Nathaniel Bensoussan, Yunwei 
Dong, Antonios D. Mazaris, Dan Smale, 
“Martin Wahl, Gil Rilov, 

‘e-mail: abatesqomun.ca 
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BOOKS & ARTS 


Musk ox 
on Russia's 
Wrangel sland. 


Life in the deep Arctic wild 


Huw Lewis-Jones revels in a study of field biology at the frontiers of climate change. 


[itso Gato 
the deeply pena othe my best 
For mon wide ts major draw andng 
ofal ste paar bear Us maritimes 
the tenes: as become the symbol of 
the high Arctic and the poster cl for 

Snore seis cold widemes 


share their uncertain 
future 
servation biologist 
Joe! Berger in Extreme 
Conservation. Berger 
haas trailed yaks (Bos 
‘mutus) on the Tibetan 
Plateau and the 
proboscis-swinging 
saga (Saiga fatarica) 
through Mongolia’s 
Gobi Desert. He has 
tracked one species 
for more than a dec 
ade. “If polar bears 


onservation 
Life atthe Edge: 
af the World 


Pres(2018) 


are the face of climate change,” he writes, 
“muskoxen ate the heart” This hairy, 
homed Arctic bovid (Ovibos moschatus), 
neither an ox nor a maker of musk, once 
roamed with woolly mammoths. Yet we stil 
know very little about its physiology, repo: 
duction, predation or food sources; least of 
all about its adaptive capacity 


UP CLOSE AND PERSONAL 
To study the musk-ox life cycle demands 
considerable effort and expense, The hos: 
tile conditions are especially complex in 
winter, with severe blizzards and tempera 
tures far below freezing, But by helicopter 


and snowmobile, the indefatigable Berger 
perseveres. He mostly eschews radio collars 
for less intrusive data-collection methods, 
such as analysing frozen dung for stress hor 
mones, and photogrammetry — assessing an 
animal's health from photographs. 

‘Musk oxen were extirpated from Alaska 
during the quarter of a century after the 
United States purchased the land from 
Russia in 1867. In the 1930s, efforts began 
to re-introduce the species to its former 
range: juveniles captured in Greenland were 
tuansported by ship, rail and boat by way of 
Norway to Nunivak Island in the Bering Sea. 
Descendants of the 27 animals that com. 
pleted the trip were then moved throughout 
Alaska, and even to Russias Wrangel Island, 
north of Siberia; the United States offered 
these animals to 
the then-Soviet 
Union during the 
cold war asa token 


“If polar bears 
are the face 


of climate 

change, of fen = 
iutiocenare Sats Et 
the heart." pl y. 


to maintain and 

estore populations 
ofthe enigmatic creatures are naw thwarted 
by the effects of global warming. Berger has 
sade ithis lif’s mission to understand the 
‘mechanisms through which these changes 
affect populations, 

He is an excellent guide, a respected 
ecologist and a gifted storyteller. I's an 
important combination in environmental 
advocacy: scientists who can tell compelling 
stories can elevate their research outputs. 
His previous books include Wild Horses of 
the Great Basin (1986), the Horn of Darkness 
(1997; written with Carol Cunningham) 
and the 2008 The Better to Eat You With. 
In his 2008 book, he tracks cultures of fear 
in animals across continents and climates, 
from elk and wolves in the US Yellowstone 
National Park to moose coexisting with 
tigers and bears in Asia. He opens our eyes 
to what it takes for such animals to cling to 
the edges of existence. 


‘TRAPPED IN THE ICE 
Independent studies by agencies such as 
NASA and the US National Oceanic and 
Atmospheric Administration have shown, 
year on year, that global average surface 
temperature is rising and that warming 
patterns are most extreme in the polar 
regions. One of the most troubling recent 
observations in the Arctic is that the num: 
ber of winter days when temperatures don't 
drop below freezing is increasing, Precipi 
tation falls as rain, melting snow on the 
ground; when the real cold returns, lichens 
and grasses are encased in steel-hard ice, 
Impenetrable to hooves. In the early 2000, 
Berger relates, such an event on Canada’s 
Banks Island led to the deaths of some 
20,000 musk oxen. And other extreme > 
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Books in brief 


Milk of Paradise 
Lucy Inglis Mactan (2018) 

thas ignited brutal wars addicted milions and transformed 
‘medicine. Opium (the sap of the poppy Papaver somnifeum), as 
cultural histonan Lucy Inglis’ magisterial chronicle reveals, has 
been used and abused for millennia, wending is way from ancient 
Mesopotamia to Asia and Victorian Britain asa sedative and 
recreational rug. The emergence of morphine and heroin in the 
nineteenth esatury sparked pharmacological advances, as well as 
iit trade, conflicts and the US opioid criss. As Inglis observes, “we 
‘must never forget that this sa battle fought only with ourselves” 


‘The Consolations of Physics, 
Tim Radio Scerree (2018) 

‘This beautiful crafted “love letter to physics” by science writer Tim 
Radford hinges an the Voyager mission, launched in 1977 to study 
the outer Solar System and naw heading out tothe hellopause and 
beyond, Radfard (a frequent contributor to these pages) finds solace 
in Voyager as a grand cooperative effort in 2 word to often at war 
His deft narrative interweaves discoveries such as the Higgs boson, 
the Hubble Deep Field and gravitational waves with Dante Aligher's 
epic fourteenth-century poem The Divine Comedy, which intuted the 
laws of motion found by Galileo Galilei some 300 years later. 


“Ponesnity 


rate 


Underbug 
Lisa Margoneli Fasrae, STRAUS AND GIROUX (2018) 

“Termites are not just the destructive force that homeowners know 
and hate — “architects of negative space’ as environmental 

writer Lisa Margonel itl putsit. They also comprise a kind of 
entomological three-ring circus, and this round-up of research on the 
‘eusocial insects isa ticket ta the show. Their guts team with wood: 
degrading enzymes that could revolutionize biofuels: the convoluted 
interiors oftheir mounds reveal astonishing group behaviours; they 
even engineer ecosystems by revitalizing sols This is2 wild ride 
through hidden microcosmas stretching from Australia to Namibia. 


‘The World ina Grain 
Vince Beiser RveRHeaD (2018) 

ur word is built on sand: it's in everything from silicon chips to 
canerete. Currently, the glebal construction industry consumes 
some USB130 bilion worth af the stuf annwally —sa much so that 
this ubiqultous natural resource is running aut Journalist Vince 
Beiser’s eye-opening study clarifies the science and the huge rate of 
sand in heavy and high-tech industry. Perhaps most compelling is 
his exposé of sand mining. which obliterates islands, destroys coral 
reefs and marine biodiversity, and threatens lvelihoods. A powerful 
lens on an under-reported environmental criss. 


Corerstones 
(Mark Smalley (editor) Lee ToLLER (2018) 

‘This sorring collection of 22 essays on Earth's bedrack ntermingles 
geology and our cultural responses to stone rar int tools and 
‘megalths to Gothic cathedrals. Editor Mark Smalley has assembled a 
.g70up of stellar contributors. Novelist Saran Moss muses over igneous 
Winstone under Hadkian's Wal in Northumbria, UK; environmental 
rite Jason Mark ponders ethereal Arctic upland at sk from ot 
cling; post Fiona Harton sees bricks as compressed time and 
energy Rock Smalley reminds us fs both thing and metaphor, 
helping us feet our way “towards the intangible”. fran Kiss 
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| coMMENT FEELIN 


> events are becoming more common, 
In 2011, more than 50 were entombed in 
ice during a Chukchi Sea winter tsunami 
(ivunigin the lbupiat language of north: 
western Alaska); elsewhere, rapid freez 
ing has entrapped whales and sea otters, 
Freak weather and storms of great 
intensity are aspects of an alarming real 
ity not confined to the Arctic. World. 
wide, biodiversity loss is accelerating 
at an unprecedented rate, For Berger 
and his peers, it has become a moral 
obligation to apply their knowledge to 
ecological well-being. “Doing science is 
not conservation,” Berger writes. ‘Don. 
ning human face, inspiring people to 
ceare, engaging people who listen, and 
ly persuading decision makers 


Berger’ methods can be eccentric, He 
pitches carnivore dung at moose, base 
ball-style, to see whether they respond 
to the scent, or 


wears a polat- “The more 
bear costume gdapteda 
Sit aad ay. speciekas 
rofoam to get Decometoits 
close to musk ¢Cological 
oxenand study Melt, the more 
their reactions, devastating 


‘Hicisthehary, climate change 
arsed action. can beforit.” 
‘man academic? 

The Times newspaper once weote. His 
data, gleaned primarily from northern 
latitudes and extreme heights in the 
Himalayas, arein my view all the more 
insightful for that, 

What Berger’s fieldwork shows us 
is that the more adapted a species has 
become to its ecological niche, the 
more devastating climate change can 
be for it. With receding sea ice, polar 
bears find it more difficult to hunt 
seals, their favoured prey, and now for: 
age more widely onshore, for example 
‘on eggs of migratory waterfowl, The 
smelt shrinks their hunting season and 
the time they have to rest and breed, 
which they normally do on sea ice 
“Life atthe extremes is more challeng- 
ing than ever” he writes, “and the need 
for action, for solutions, has never been 
greater” 

We need to remember, oo, that cold 
adapted species have survived “across 
thousands of generations’, Berger 
notes, A fraction of that time is left 
for us to curb the impacts of climate 
change. 


Huw Lewis-Jones isan environmental 
historian and expedition guide, 
working regularly across the Arctic and 
Antarctica 

Twitter: @polarworld 
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Sex, religion and a 
singular anatomist 


Andreas Vesalius’s images baffled many early on, reveal 
Daniel Margécsy, Mark Somos and Stephen N. Joffe. 


i 
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he Renaissance anatomist Andreas 
"Tienes in por 

Jabrica ‘On the fabric of the human. 
body’) is foundational work of medicine 
in the West Its more than 200 woodcuts 
revolutionized how people pictured the 
human body, flayed and cut to reveal mus- 
culature, nerves, organs and bones, Even 
now, 475 years after it was first published, 
the bold images of skeletons and skinless 
“muscle men’ in sinuous poses (by illustra 
tor Jan Steven van Galcar) beguile. 

‘More than 700 copies survive from the 
1543 and 1555 editions, which Vesalius 
supervised. Of these, roughly two-thirds 
contain comments in the margins, bizarre 
doodles, and coloured-in and even defaced 
Images, as we reveal in our book The Fab 
rica of Andreas Vesalius. Early readers, on 
evidence, studied Vesalius’ treatise dili 
gently, yet had no compunction about scrib- 
bling in hugely expensive volume 

Looking deeper, the marginalia tell two 
stories, One is that some found the images 
baffling, and attempted to clarify them in 
innovative ways. Another is thatthe pious 
found the figures’ necessary nudity scan 
dalous, and feltimpelled to weigh in with 
ink and scissors. Our study ofthe reactions 
of hundreds of readers has taught us that 
‘medical communities do not always adopt 
innovative solutions quickly, even when 
they are presented in such an elegant for- 
smatas the Fabrica, Ittakes time to get used 
tonovelty. And we have learnt that even the 
‘most ingenious scientific minds can fail to 
predict how political and religious institu 
tions will respond to theie work. 

‘The Fabrica’ early readers were the frst 
generation of physicians and surgeons in 
Europe to face the daunting task of using 
detailed printed images to identify the 
organs ofthe body and learn about human 
physiology. Vesalius and van Calear faced 
challenges of their own. The Fabrica image 
of the branching portal vein, which carvies 
blood from the intestines to the liver, is 
highly complex — and does not quite suc 
ceed. Itis almost impossible, for example, 
to single out the haemorrhoidal vein. (At 
the time, this was important because the 
vein was supposed tobe the cause of both 
‘menstruation and haemorrhoids, thought 
tobe analogous processes that purged 
corrupted blood from thebody.) Thus, 
ina copy nowin the library of Queen's 
College at the University of Oxford, 
UK, someone used a quill and red ink: 
to colour in this meandering vein, like 
achild playing a maze game. 

Ina copy once owned by Nuremberg 
physician Georg Palma, an intricate image 
of the brain is ‘enhanced’, Palma painted 
six pairs of cranial nerves different hues in 
watercolour and used the same colours to 
underline the corresponding pairsin the text 
cn the following page. 


Even Vesalius realized that his images 
‘could be confusing, and devised an ingen- 
ious method to explain them, A letter oF 
‘umber was printed onto the image ofeach 
body part, with a separate key. Unfortu- 
nately, the characters were often too small, 
topick out against the swirling background, 
Some frustrated 

readers under. 

lined, high 


“Even the most 


ingenious 
lighted, enlarged scientific 

oF repeated the Mieuelan 
chase efor 
fausele maa, for NOW oitical 
instance, the tiny @™dreligious 
character identify. istitutions will 
inga thigh muscle "espondto their 


was barely visible, 
and a confused 
reader queried desperately whether it was 
the Greek letter wor the Roman letter u. 
Faced by such challenges, many medics 
‘might have given up on the images. Indeed, 
when we reconstructed what early mod- 
‘ern readers and scholars found fascinating, 
about the Fabrica, it was evidently the text. 
The clear majority of sixteenth- and sev- 
centeenth-century readers who annotated 
the book focused on that and left no traces 
of having engaged with the illustrations. 
Sixteenth-century reviews of the Fabrica 


BOOKS & ARTS [EUG 


confirm this impression, because they 
tended to discuss only the text. 

‘Thisis no surprise. The Fabrica’sschol- 
arly readership was trained inthe traditions 
‘of Renaissance humanism, which put a 
strong emphasis on textual analysis. Even 
if they found it difficult to interpret visual 
information, medical practitioners were 
‘expertat making sense oflong Latin texts, 
Furthermore, the body’s ‘interior universe’ 
had hardly been mapped. Even today, itis 
difficult to make sense of images of inter 
ral organs if you've never seen a dissected 
body and radiologists need years of training 
to interpret X-rays or magnetic resonance 
imaging scans 

images were not that helpful for under- 
standing the body, what was their purpose? 
For Church authorities in the period, the 
answer was clear. They argued that such, 
figures held an erotic appeal hecause they 
showed the genitals — and soshould becen- 
sored, The first version ofthe Index lirorum 
prohibitorum, thelist of books banned by 
the Catholic Church, came out in 
it did not mince words about 
books. That included tomes on anatomy. 
Many owners of the Fabrica felt that they 
‘had to paint aprons on the muscle men (as 
inthe copy once owned by the Jesuit College 
‘of Bourges in France), or snip the offending 
parts out 

Only a minority of copies of the Fabrica 
ere so treated. We checked every surviv- 
ing copy, and found the images intactin the 
‘majority owned by Catholics inthe period, 


‘The portal 
vein, with the 
bhaemorthaidal vein 
coloured in. 
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That invasive censorship happened at all 
signals that, at least until the trials of Gali 
leo Galilelin the early seventeenth century, 


the Church found anatomical illustrations 


‘more dangerous than heliocentrism. « 
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CoRRECTION 
The Books & Arts article ‘Summer books! 
(Nature 859, 328-330; 2018) misnamed 
the author of On Bullshit; he Is Harry 
Frank. 
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Shock prize 
announcement 


‘This year's Gruber Cosmology 
Prize — the most prestigious in 
the field —went to the European, 
Space Agency's Planck satellite 
observatory team for its precise 
‘measurement of the Universe's 
contentsand contours. The 
'US$500,000 prize wll be awarded 
‘0n 20 Augustin Vienna, The 
Planck team has more than 
300 members, of whom about 
one-fifth are women, Yet the 
collaboration has indicated 
thatthe terns halfshare of the 
prize money (two principal 
investigators share the other 
half) would be divided between 
43 senior members of the 
collaboration all of whom are 
‘men. Although the number of 
recipients has tobe limited and 
the prize money mightend up 
being pooled itis remarkable that 
this situation has arisen in 2018. 
‘Thatall Planck’ female 
scientistshave even temporarily 
been deemed unworthy of 
controlling ashare ofthe prize is 
unwelcome news, especially to 
the many of us trying to tackle 
the under representation of 
women in astronomy, 
Olivier Berné Research Institute 
in Astrophysics and Planctology, 
University of Toulouse, France, 
olivier berne@irap.omp.ew 


Speed up global ban 
on trans fats in foods 


| suggest that the food industry 
should besubjecttoatime 
limit for removing hazards 
identified inthe global food 
system (see L. Haddad Nature 
556, 19-22; 2018). For example, 
we have known for decades that 
industrially processed trans-ftty 
acids (TPAs) in food area risk 
factor for cardiovascular disease. 
Although TEAs can be removed 
from the food supply efficiently, 
‘many countries these stil persist 
(see, forexample,S, Stender etal 
BMJ Open 6, e010673;2016). 
‘Denmark has been leading the 
fightagainst TFAs since 2004. 
And seven years have been lost 


since the European Union issued 
its food-labelling regulation in 
2011, which would have been 
‘an opportunity to tackle TAS. 
‘Although a TFA ban isstillon 
the EU agenda (see go.nature. 
‘com/2okkfis), taking action i 
up to individual states — for 
‘example, TEAS are no longer 
permitted in Slovenia, 

By contrast, aban on partially 
hydrogenated vegetable oils, 
the source of TFAsin food, 
hhas ust come into effect in the 
United States; Canada will follow 
‘next month, And the World 
Health Organization this year 
‘made elimination of TFAs by 
2023 the highest priority in its 
“REPLACE action programme 
(see go.nature.com/2vintqp) 
Igor Pravst Nutrition Institute, 
Ljubljana, Slovenia 
‘igor pravste@nutris.org 
Competing interests declared: see 
-go.nature.com/2nyeaby 


Undergrad research: 
begin at the start 


question the common practice 
of training undergraduate 
students in research only during 
their final year at university (see 
J Ankrum Nature hup://dotorg/ 
‘gdwps2; 2018). Foran honours 
thesis,a student in Canada 
typically spends 4 months 
learning technique to inform 

4 months — or around 350 hours 
— ofelfective research. Instead, 
‘my lb trains undergraduates in 
research methods throughout 
their degree, so that they have 
about 1,200-1,600 hours of 
‘hands-on research experience by 
the time they graduate, 

‘Werusea vertical peer- 
‘mentoring system in my 
synthetic-chemistry lab 
graduates and postdocs oversee 
undergraduate projects assisted 
by experienced undergraduate 
researchers, who help to train 
the new undergraduates 
"The lab currently hosts 
25 undergraduates, with, 
fan annual intake of around 
7 promising first-year students 
‘They spend thei first year 
learningand developing 


techniques, their second testing 
and troubleshooting chemical 
reactions thet third designing 
and implementing small esearch 
projects, and their final year 
producing their thesis 

John Trant University of 
Windsor, Canada. 
Hrant@uvindsorca 


Tackling Alimpact 
onddrug patenting 


Initiatives are already under 
\way to-avoid ill-considered 
‘moves concerning the impact 
ofartficial intelligence (AI) on 
drug patenting (see L. Heuer 
Natture 558, 519; 2018). 

Heuer mentions some of the 
issues, For example, he foresees 
problems over whether to 
{designate the algorithm or its 
programmer a the inventor, 
and whether a drug discovered, 
through machine learning 
methods would be patentable, 

Inthe United States, at east, 
some ofthese issues are currently 
clear. For example, US patent 
law states that "a person shall 
beentiled to patent’ and an 
algorithm is not a person. Italso 
states that “patentability shall, 


More generally, itis insufficient 
toassert thatjustbecause an 
Al couldarrive at particular 
‘olution, then that solution must 
be obvious. 

However.a serious problem 
for pharmaceutical companies 


Jsthat,according to US law, only 
people can make the inventive 
step in patents. In practice, itis 
likely that algorithms are making 
many of those steps, raising 
questions about the validity of 
these patents in the United States. 
Wewelcome efforts to arriveata 
consensus over such dilemmas by 
the robotics research community 
(see go.nature.com/2onhgeb), 
intellectual-property 
professionals (see go.nature, 
‘com/2oiwwhdc), the European 
(Commission and the European 
Patent Office. 

RossD. King University of, 
‘Manchester, UK, 


Patrick Courtney tec-connection, 
Konstanz, Germany. 
ross. kinggomanchester.ac.uk 


Junior reviewers 
jump into the pool 
Asmembersof the Assaciation 
a Polar Early Carer Scents 
CAPECS),we participated 
ina pomp mtew ofthe 
ipotenng report on the 6k 
udcryphere from the 
Intergovernmental Panel on 
Climate Change (PCC).Our 
alysis compared well with 
Frrecenby mse sor ets 
(see also L. van der Veer et al. 
Gi. Change 125,137. 148; 
2014) Earpcarecr scents 
aacan chapped source ofpeer 
vies Wyn ew cul 
bedeployedas successful on 
joornal ensecptss oo ange 
pert 

We encourage othe eal 
areer cea engage fa 
inivideal and pro revs 
ucasthowe ogaaiaed by 
'APECS. including the second 
review f the IPC ocean and 
Cgyoepherereprttaringplace 
inter this yar 

Expansing the reviewer 
pool inthis way would benefit 
thescientiiccommanity by 
slgaing the review buen 
(se, forexample M Kovanis 
Eta PLoS ONE T1,c0166387; 
2016). There wouldbe career 
acrantnges or junior esearchrs 
howe secomplbed 
revioners And they would 
fxn isight into improving the 
Prepttaton and peseration of 
Thee cm paper 

Comprehenive report sick 
as those compiled by the IPCC 
provides means forte scentific 
Exmuniy to och te public 
Stch engagracat is becoming 
increadngt important seat 
sareer researcher tantra 3 
Contibutetoitefecivey 
Mathieu Casado™ Aled 
Wegener Insite Hela 
Centre for Plerand Marine 
Research, Posdams Germany. 
ruthie csadog prilcom 
“Onto 2c signatories (ce 
sanaturecom md fof) 
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OBITUARY 


Paul Delos Boyer 


(1918-2018) 


Nobel laureate in chemistry who discovered nature’s smallest rotary motor. 


Pe Boyer was approaching 
the finish line of his career 
when he risked everything 
‘with a jaw-dropping proposal, He 
addressed one of the most impor- 
tant, as-then-unanswered questions 
in biochemistry: how do cells usean 
electrochemical gradient to form the 
molecule adenosine triphosphate 
(ATP), which drives the energy 
requiring processes that are essential 
tolife? 

More than 90% ofthe ATP formed 
in oue cells is catalysed by an enzyme 
called ATP synthase. Boyer’s remark- 
able proposal for how the enzyme 
‘works was that ATP synthase func 
tionsas a tiny molecular motor. Just 
as electric motors spin when elec- 
tons flow through a potential gradi- 
ent, flow of protons (hydrogen ions) 
across an electrochemical gradient 
generated by respiration causes the 
core of the ATP synthase to spin rela 
tive toits surrounding catalytic subu. 
nits. Because the core is asymmetric, 
its rotation causes the surrounding 
subunits to change their shape, thus dis 
rrupting the tight binding site where ATP is 
formed, allowing its release 

Boyer died on 2 June, eight weeks shy of 
his 100th birthday. Born in Provo, Utah, 
he studied chemistry at Brigham Young 
University, also in Provo, After marry 
ing his college sweetheart, Lyda Whicker, 
he headed in 1939 to the biochemistry 
department at the University of Wiscon 
sin-Madison for his PhD degree. After 
postdoctoral training at Stanford Univer- 
sity, California, in 1946 he accepted a faculty 
position at the University of Minnesota, St 
Paul. In 1963 he returned to California to 
head the biochemistry division ofthe chem: 
istry department at the University of Cali- 
fornia, Los Angeles (UCLA), where he spent 
the rest ofhis career. 

Boyer’ revelations about ATP synthase 
‘occurred in three steps and involved insights 
that defied dogma. I was fortunate to be a 
postdoc in Paul’ lab when he came up with 
the first of these. 

We were attending UCLA seminar in 
1972 when I noticed that he wasnt paying 
attention to the speaker, Afterwards, Paul 
approached us ina very excited state. This 
‘was surprising because he was known for 
his calm demeanour. He confessed that 
he had spent the hour thinking about old 


unexplained data. He asked: "What would 
you say if I told you that it doesn't take 
energy to make ATP at the catalytic site of 
ATP synthase? (as was universally held atthe 
time) “but rather that it takes energy to get 
ATP offthe catalytic site?” This was eureka 
‘moment. 

In other words, tightly bound ATP forms 
spontaneously at catalytic sites on the 
synthase, Energy from the electrochemi- 
cal gradient is used to alter the surround 
ing structure, which disrupts the tight site, 
allowing ATP release 

‘As is often the case with transforma- 
tional ideas, early reactions were negative. 
‘When the Journal of Biological Chemis. 
iry rejected our manuscript containing 
data supporting this concept, Boyer told 
me without animosity that he could see 
why they would do that — “It wasa very 
striking claim” The work was published 
in 1973 in the Proceedings of the National 
Academies of Science. 

‘A second feature of the mecl 
which he published a 1977 paps 
recognizing that the multiple catalytic 
sites of the synthase coordinated with 
‘one another: the finished ATP product is 
released only when ATP precursors bind at 
another, adjacent site, 

Finally, in 1981-82, Boyer proposed 


that rotation of the asymmetric 
core of the synthase, driven by an 
electrochemical gradient, led to the 
necessary changes in the surround. 
ing catalytic subunits 

Subsequently, all three concepts 
received strong experimental support 
from numerous labs. This included 
results of X-ray crystallograph 
studies conducted by John Walker at 
the UK Medical Research Council’s 
Laboratory of Molecular Biology in 
Cambridge; he was also a recipient of 
the 1997 Nobel Prize in Chemistry. 

Boyer had a gift for extracting 
insights from data. But he also drew 
on knowledge acquired over years of| 
reading beyond his field. As editor of 
19 volumes ina series of books called 
The Enzymes, and as co-editor and 
editor of the Annual Review of Bio 
chemistry, he followed all the major 
advances in enzymology over several 
decades. He also gained an edge by 

ming to biochemistry asa chem: 

at asa biologist as was common 
in the 1940s, His firm understanding 
ofkinetics and thermodynamics played an 
important part in his discoveries. 

Boyer was a highly effective trainer of 
young scientists. Mentoring by example, 
he exhibited an incredible work ethic, a 
love for science and a pure joy in discov 
ery. He referred to his favourite enzymes 
as friends. He set high expectations while 
providing a supportive and stimulating 
environment, Even after you left his lab, 
he looked for opportunities to help you 
advance your career. 

Paul was also a role model for how to live 
your life with integrity, respect for others 
and kindness. He showed exceptional 
civility in a field known to be contentious 
(shouting matches were common at scien 
Lific meetings). In all the years we worked 
together, I never heard him raise his voice 
in anger or offer a personal criticism of 
any of his competitors. When faced with 
a difficult situation, I still ask myself: how 
would Paul Boyer handle this problem? 
The answer is always: with grace and 
generosity. 


Richard L, Cross is SUNY distinguished 
professor emeritus of biochemistry and 
‘molecular biology at Upstate Medical 
University in Syracuse, New York, USA. 
e-mail: crossr@upstate.edu 
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Anew twist on catalytic teamwork 


Catalysts working in pairs can promote more-effective reactions than can the same catalysts used sequentially. The 
coupling ofan enzyme with a light-activated catalyst offers great potential for organic synthesis. See Lerrer #355 


NICHOLAS J. TURNER 


he development of catalytic reactionsis 
dominant theme in chemistry, espe 
cially in industry, where major efforts 
are under way to develop large-scale chemi- 
cal processes that are sustainable and avoid 
producing unnecessary waste’. Chemical 
reactions can beaccelerated using many types 
of catalyst, including metas (or their salts or 
complexes), small organic molecules, enzymes 
and light-activated catalysts. Catalysts of all 
types have advanced tothe extent that two or 
‘more catalysts can be combined to promote 
cascade reactions — interconnected trans 
formations, carried out ina single operation 
to yield products with selectivities that would 
be dfficultto achieve using the catalysts inde 
pendently in sequential steps, On page 355, 
Litman etal report that the combination of 
anenzyme with alight-activated catalyst starts 
a cascade reaction that produces compounds 
that are versatile intermediates for organic 
synthesis, 

The use of combinations of catalysts could 
potentially lead to step changes in the effc. 
ency of chemical processes’. Certain combina 
tions of enzymes with small-molecule organic 
catalysts oF transition-metal catalysts have 
been of particular interest — in part because 
the chemistry mediated by these different cat- 
alyst types is highly complementary, and also 
because waters used asthe main solvent, thus 
avoiding environmentally harmful organic 
alternatives. Moreover, such combinations 
can open up synthetic routes for construct- 
ing molecules that would not otherwise be 
possible. The emergence of light-activated 
catalysts (photocatalyst) in the past few years 
hhas presented opportunities forthe develop- 
‘ment of systems that combine enzymes with 
photocatalysts 

Enter Litman et al., who have used just 
such a combination to promote reactions of 
alkenes — organic compounds that contain 
carbon-carbon double bonds. Many alkenes 
can form as isomers, known as (E)- and 
(Z)-isomers, which differ in the geometri- 
cal arrangement of groups attached to their 
double bond (Fig. 1a). Although methods 
exist that allow just one isomer of an alkene 
to be produced during synthesis, itis often 
cheaper and easier to prepare alkenes as 
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Figure 1 | Combinations of enzymes and light-activated catalysts enable organic reactions. a, Litman 
tal used alight-activated catalyst (a photocatalyst) stu with an enzyme (ene reductase) to catalyse 
the conversion of mixtures of (2)-and (2)-somers of alkenes into product that form predominantly 
asasingle steeoisomer (an isomer that contains a particular spatial arrangement of bonds). The 
photocatalyst promotes the interconversion ofthe isomers, whereas the enzyme reduces only the 
(2)-isomer. The hashed bond extends into the plane ofthe figure. b, A diferent photocatalyst has 
been used” to convert imines into free radicals (dot indicates an unpaired electron), which acept a 
hydrogen atom from a donor to frm amines asa mixture of stereoisomers. The enzyme monoamine 
oxidase recycles one sterenizomer back to the imine, which is therefore eventually all converted 

into the other steenisomer. The solid-wedge bond extends tof the plane ofthe igure. c, Another 
photocatalyst promotes’ the addition of thiols to enones to form ketones, which canbe edaced in situ 
byaketoreductase enzyme to form mercaptoalkanols.R groups representa variety of organic groups 
At aryl group: EWG, eleetron-withdrawing group. 


mixtures of (E)- and (Z)-isomers. But using 
such mixtures can be problematic. Alkenes 
are often chemically reduced during organic 
syntheses — that is, the carbon-carbon dou- 
ble bonds re converted into single bonds. But 
the reduction of both types of isomer together 
typically yields products known as stereoiso- 
mers, which have different spatial arrange- 
‘ments of groups attached toa specific carbon 
atom and can be difficult to separate 

The ideal solution to this problem would be 
tocombine acatalystthat converts (E)-alkenes 
into (Z)-alkenes (and vice versa) with a see- 
ond catalyst that reduces only one of the two 
alkene isomers, thus yielding only one stereo- 
isomer. For example, if the second catalyst 
reduces only (£)-alkenes, then it initially 
catalyses the reduction of any ofthat isomer 


foundin the original alkene mixture, and then 
‘goes on to reduce any (E)-isomer produced 
from the (Z)-isomer by the first catalyst. Both 
isomers of the alkene mixture ae thus even: 
tually consumed to make the same product. 
Thisis exactly what Litman etal. reportin the 
current work 

The authors’ catalytic system builds on 
previous work’ that reported the use of iri: 
ium-based photocatalyss to interconvert the 
(E)- and (Z)-isomers of a range of different 
alkenes. In the current study, Litman and col: 
leagues combined analogues of those iridium 
photocatalysts with ene reductase enzymes, 
which reduce alkenes and are generally 
()-selective, although the authors also tested 
a (Z)-selective ene reductase in their system, 

The researchers optimized various 


parameters of their reactions, including the 
concentrations of the iridium catalyst, the 
enayme and the enzyme’s cofactor, They 
developed a system that reduces mixtures 
of (E)- and (Z)-isomers of alkenes to form a 
single stereoisomer, in multi-mnilligeam quan- 
tities (Fig. 1a). The authors went on to convert 
the stereoisomer into a variety of biologically 
active molecules and key intermediates that 
have been used to prepare such molecules, 
thereby highlighting the potential applica 
tion of their chemistry for preparative organic 
synthesis, 

‘As Litman et al. point out, photocatalytic 
reactions typically occur at or near to room 
temperature, which makes them compat: 
ible with the thermal requirements of enzy 
matic systems. Photocatalysts also often work 
through mechanisms (such as outer-sphere 
electron transfer and energy transfer) that 
generate intermediates that ate stable in the 
presence of water and tolerant ofthe chemical 
groups found in enzymes. Therefore, photo- 
catalysts in general mightbe particularly suit- 
able for being combined with enzymes for 
synthetic reactions. 

This compatibility of photo- and enzy- 
matic processes has been exploited in two 
other studies published easlier this year. In 
the firs’, water-soluble iridium catalyst was 
combined with the enzyme monoamine oxi- 
dase (MAO-N) to convert racemic mixtures 
(one-to-one mixtures of mirror-image stereo- 
‘somers known as enantiomers) of amine com- 
pounds intoa single enantiomer (Fig. 1b). This 
process begins by generating a highly reac 
tive free radical from a starting material (an 
imine). The radical is then converted in situ 
to a racemic mixture of amines. MAO-N 
recycles only one of the enantiomers back 
into the imine, and the whole process repeats 
until all ofthe imine has been converted into 
the enantiomer that is not the substrate for 
MAO-N. In the second study’,a photocatalytic 
reaction of thiols with enones was used to gen- 
erate ketone intermediates that were reduced 
insitu with a ketoreductase enzyme, yielding 
products known as mercaptoalkanols enanti- 
selectively (Fig. le). 

‘As with all enzymatic ystems, the reaction 
scope and scalability of Litman and colleagues 
transformation will determine the extent 
to which it finds practical applications. For 
example, the alkene substrates reported in 
the paper arelinear molecules that bear aryl 
groups (structural units that contain benzene 
fings;a substrate bearing an aryl group known 
asa pyridine ring is also reported). It will be 
interesting to see whether the chemistry can 
be extended to cyclic and non-aryl-bearing 
substrates, Moreover, the concentration of sub- 
strates used in the reactions is currently lower 
than would be needed forindustral processes 
remains to he seen whether the photocatalyst 
and enzyme will work at industrially useful 
substrate concentrations. 

Even if the enzyme does not work under 


conditions demanded by industry, or for 
a broad range of substrates, all is not lost, 
“Techniques such as protein engineering and 
directed evolution are increasingly being 
used to rapidly optimize the characteristics of 
‘enzymes (such as their substrate scope, stabil 
ity and selectivity) to make them compatible 
with industrial processes", Indeed, enzymes 
are the ultimate tunable catalysts, and will 
therefore surely be combined with many other 
chemical catalysts in the future, 
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Birds perceive colours 
in categories 


"Humans perceive colours in categories such as red, eventhough we can discern 


ALMUT KELBER, 


he amount of information reaching our 

sensory organs every second would 

be overwhelming fit were not for our 
ability to categorize it. Colour perception isa 
‘good example ofthis phenomenon. When we 
Pick strawberries, we can easly discriminate 
between unripe fruit and fruit of the many 
different shades of red that indicate ripeness. 
Caves et al report on page 365 that zebra 
finches (Taeniopygia guttata) can aso perceive 
continuum ofcoloussasbelongingto distinet 
categories, a phenomenon that affects birds’ 
ability to distinguish similar colours. 

Although we can easily discriminate 
between the different shades of ripe straw- 
berries, we tend to generalize and treat these 
shades as being equivalent. When compar- 
ing colours, ifthe differences between them 
are on the same scale of separation, our ability 
to perceive differences between colours from 
two separate categories, say ‘red’ and ‘orange, 
isenhanced compared with our ability to per 
ceive dfferencesin colours thatare both within 
‘one of these categories". This enhanced ability 
to distinguish between colours if the colours 
are in separate categories is called categorical 
colour perception. 

The preconditions necessary for the 
ability to perceive colours in distinct catego- 
ries had already been demonstrated in birds, 
Humans and our close relatives have evolved 
to have three types of colour-sensing cone 
cell in the eye, and birds have evolved to 
have four types". Birds have impressive 
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colour-discrimination abilities’, including 
the capacity to perceive the ultraviolet range 
of the spectrum, A remarkable earlier study” 
provided clear evidence that birds can gener- 
alize among certain colours, and thus divide 
the continuum of the colours that they per 
ceive into discrete categories. But it was not 
known whether this ability affects how birds 
perceive similar colours and whether it helps 
them to spotkey colour differences. Caves and 
colleagues investigated whether birds ability 
to categorize colours affects their colour- 
discrimination abilities, and thus whether 
these animals have 


‘Birdsaethe en 
inwhich nn” edn ingenious 
categorical Female zebra finches 
colour percention reset 
demonstrated. 


food was hidden 
beneath coloured 
discs, Food was present beneath bicoloured 
discs and absent below discs composed of 
single clout. Thistraining scheme allowed the 
authors to test how well the birds recognized, 
colour differences by their ability to identify 
bicoloured discs when searching for food. 
‘The authors studied a range of colours from 
‘orange to red, evenly dividing this part of the 
spectrum into eight shades of colour. Caves 
and colleagues made great efforts using physi 
‘ological models ofbird colour vision, to make 
allofthe steps between the shades equivalently 
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Figure | Categorical colour perception in birds. Caves etl! iwvestigated colour perception in 
cra finches (Taeniopyga guttat). The authors found that, like humans, the birds group calours 

{nto categories and ths aflects their colour-dscrimination ability —a phenomenon called categorical 
colour perception a, The authors tested birds abity to distinguish between neighbouring colour 

pairs in eight evenly separated shades on the orangeto red spectrum: Birds were substantially better 

st discriminating between shades Sand 6 than between ther pairs, suggesting that his represents a 
‘olour-calegory boundary: by To test whether birds have the capacity for categorical colour perception, 
theauthors presented birde witha device that had een designed sa thatthe ability to distinguish between 
‘hwo colours and identify bicoloured discs enabled birds to access a food reward. The wells the device 
‘contained seeds covered with bicoloured diss and empty wells that were either uncovered ar covered by 
‘ingle-coloured discs of each of the colours on the bicoloured discs, The finches were less succesful at 
Identifying bicoloured discs ithe colours were on the same sde of the 3/6 category boundary (such as 
colours 6 and 8) than ithe colours were in different categories (Sand 7). 


sized on the birds colour scale. These colours 
are worthy of tention because the zebra finch 
beakis ed or orange. Beak colour depends on 
the amount of astaxanthin pigment deposited, 
which reflects the health of an individual's 
immune system', hence these colours might 
provide information about an individual's it- 
ness. Females seem to beable to discriminate 
not only between males that have red or orange 
beaks, but also between males that have beaks 
of differing red shades’. However, whether 
female preference for males depends on male 
beak shade is debated’ 

‘Caves and colleagues frst tested the finches 
using neighbouring pairs of shades from their 
eight-step colour scale and observed that birds 
distinguished between two ofthe shades better 
than between any other pair of neighbouring 
hues. This suggests that a putative border is 
present between the red and orange shades, 
‘The authors then investigated whether the 
birds were better at discriminating between 
paits of colours of a similar level of shade 
separation that cross the proposed category 
boundary, compared with their ability to 
discriminate between colous pairs from one 
side ofthe category boundary (Fig. 1), Zebra 


finches passed this key test, demonstrating 
their capacity for categorical colour perception. 

"This result is fascinating and thought- 
provoking for many reasons. Birds are the 
cnly animals, besides primates”, in which 
categorical colour perception has now been 
demonstrated, More work should be done to 
investigate whether other aspects of colour, 
suchas intensity and spectal purity, influence 
categorical perception in birds. It would asobe 
interesting to determine whether zebra finches 
ability to group colours into ‘red’ and ‘orange 
has relevance for mate choice. However, this 
could be difficult to test because mate selec- 
tion might depend on a range of male char- 
acteristics such as the rate of male courtship 
displays’, rather than only beak colour. 

‘The work also has implications for our 
understanding of human colour perception. 
‘There is an ongoing debate about whether 
language — including colour terms such as 
red, blue, green and yellow — influences col- 
cour perception. One school of thought holds 
that colour categories have a cultural and 
linguistic basis’. The hallmark of eategori- 
cal perception — faster and more-accu- 
rate discrimination of colours in different 
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colour categories — is seen only ifa subject’ 
language has names forthe specific colour cat- 
cegories being compared 

‘The other school of thought contends that 
colour perception has a biological basis that 
is not dependent on cultural and linguistic 
influences. Evidence to support this view 
point includes the observation that terms for 
specific colours cluster around the samme hues 
across different languages’, and the fact that 
infants can discriminate between red, green, 
blue, yellow and purple before they have 
learnt the words for these colours’. Caves 
and colleagues’ finding that birds have the 
capacity for categorical colour perception 
adds more evidence to support the biological 
bass ofthis phenomenon, 

‘Why might categorization be important, 
and how does it fit into the broader context 
‘of signal perception? The term ‘categorical 
perception’ was coined to describe the human 
ability to distinguish sounds in discrete units, 
called phonemes, that help to discriminate one 
word from another (such asthe sounds 4, 
“b’and ‘pin the English words bad, bat, pad 
and pa’), Perception of phoneme-like elements 
also occurs inother animals, including birds” 
‘Categorical perception could be described as 
2 top-down mechanism to focus on key sen- 
sory cues by separating such signals from the 
‘enormous volume of irrelevant information 
‘Another way to achieve this separation is a 
bottom-up approach termed ‘matched filter, 
concept which proposes that many animals’ 
sensory organs are designed as filters that per 
ceive only the range of information that is rl 
evant tothe organism". These two approaches 
could together enable animals to handle the 
vast amount of sensory input thats needed to 
inform their choicesand behaviours. 

‘Thelevel of contribution of these processes, 
and how they evolved in different animal 
clades, are topics worthy of further study. 
Caves and colleagues' work on zebra finches 
-mightbe the start ofa wider survey of ategori- 
cal perception of colourin other animals. m 
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Profile of an unknown 


airway cell 


RNA sequencing of single cells in the mammalian trachea reveals a previously 
unknown airway cell that expresses genes involved in fluid and solute balance, 
and that might play a part in cystic fibrosis. See AnTiCLe #319 & Lerrer 377 


KYLE J. TRAVAGLINI & MARK A. KRASNOW 


ver the past two centuries ight and 
(@ hese 

scientists to identify six or more cell 
types that line the mammalian airways, trans- 
porting oxygen into the body and protecting 
us from microbes and particles inthe air we 
breathe, Inthe past two decades, the identiti- 
cation of molecular markers for most ofthese 
cell typeshas provided insight into their func- 
tions, and given researchers and physicians a 
way to locate and characterize the cellsincini- 
cal specimens. Montoro et al.'(page 319) and 
Plasschaert eal (page 377) now describe the 
full gene-expression profiles of cell types that 
line the mammalian trachea. In doings, they 
have discovered a previously unknown cell 
type that could hold the key to understanding 
the disease cystic fibrosis. 

‘The genomics revolution was spurred by 
the development of techniques to measure 
messenger RNA levels for every gene" and thus 
to profile gene expression across the genome. 
However, these early approaches required 
millions of cells to generate enough mRNA 
for analysis. As such, they were limited to 
producinga profil of average gene expression 
amongall th cells in a population analysed. 

It is only in the past decade that more 
sensitive methods have become available 
to profile individual cells. In the past cou: 
ple of years, improvements in cell isolation 
and in mRNA capture, amplification and 
sequencing have allowed single-cell mRNA 
sequencing (scRNAseq) to be broadly applied 
across biology”. This technique has begun 
to reveal the diversity of cells in developing, 
mature and diseased tissues, without any 
need for prior knowledge of the cells or the 
ability to purify individual cell types’. Many 
groups are now reporting gene-expression 
profiles for thousands or tens of thousands 
of individual cells. These profiles can be 
used to reveal the gene-expression programs 
that govern dynamic biological processes 
such as cell differentiation’, and to create 
molecular cell atlases of entire rgansand even 
whole organisms". 

In the current studies, both groups used 
scRNAseq to analyse tens of thousands of cells 
from the lining (epithelium) of the trachea 
of mice, and Plasschaert etal. also analysed 


‘human airway cells that had proliferated and 
differentiated in culture. Their analyses con- 
firmed gene-expression profiles for the two 
‘most common cell types: club cells, which 
secrete components ofthe mucus that lines the 
airways; including antimicrobial and immune 
‘modulatory proteins; and ciliated cells, which 
carry protruding structures called cilia that 
switl to clear mucus and debris, 

In addition, the large number of cells 
analysed revealed expression profiles for 
some rare and less-well-characterized cell 
types: goblet cells, which produce mucus 
proteins; tuft cells, which are thought to act, 
as immune sensors" and neuroendocrine 
cells, which sense oxygen levels, irritants and 
stretch in the airways, and signal to other 
lung cells and the central nervous system. 
‘The analyses also uncovered molecularly 
distinct subpopulations of club, goblet and 
tut cells 

‘The two studies then established the gene- 
expression profile of basal cells, which are 
located below the other cells in the epithelium. 
(Fig. 1) and function as stem and progenitor 
cells, Montoro etal. combined scRNAseq with, 
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a technique to track basal cells, showing that 
basal cells can directly give rise not only to 
club cells as previously shown", but also to 
all the minor cel types analysed except gob 
letcells. Goblet cells, lke ciliated cells seem 
toarise from club cells. The groups identified 
previously unknown molecular markers for 
each cell type, along with type-specific com 
binations of transcription factors, which are 
presumably needed to select and maintain 
each cell types distinct properties during 
airway development and renewal, 

‘The groups most surprising and important 
finding was the discovery of a previously 
unknown celltype.’The authors dubbed these 
tare cells pulmonary ionocytes, because the 
cells gene-expression profile overlaps with that 
ofionocytes found in fish gills. In fish, these 
cells maintain normal solute concentrations by 
regulating the exchange of sodium, chloride 
and calcium ions between the animals tissues 
and the surrounding watee®.Itis not yet clear 
‘whether pulmonary ionocytes serve similar 
function in mammalian airways, although the 
cells do express multiple ion-transport genes. 

In addition, both the fish and the mam. 
‘malian cells produce a transcription factor 
of the Foxi family. In fish, the Foxi protein is 
required for fish cells to adopt the character. 
istics of ionocytes. Likewise, Montoro et al 
found that the Foxil gene is necessary for the 
expression of ionocyte markers in the mouse 
trachea, and Plasschaert etal showed that the 
FOXII protein governs ionocyte identity in 
cultured human airway cells, Notably, both 
‘groups report that Foxil controls the expres 
sion ofthe gene cystic fibrosis transmembrane 
conductance regulator (CFTR) in pulmonary 
ionocytes, 

‘The CFTR protein transports chloride ions 


Water 


oe 


Figure! | Cells lining the trachea in mice, The inner surface ofthe trachea harbours multiple cll types. 
[A protective ining of mucus is secrted by abundant clu cells, rare goblet cells and submucosal glands 
(ot shown). Ciliated cll bearing protrusions called cia slowly propel mucus ut of thelung. The surface 
also contains rare sensory neuroendocrine (NE) cells tuft cells and basal progenitor cells Two studies” 
Ihave now identified another raecelltype on the airway surfice:theionocyt.lonocytes highly express the 
‘gene Cir, which encodes the CPTR ion channel through which chloride ions (CI) pass fram the ell into 
the mocus, followed by water passage through a differnt chan (not shown). Basal cellsand club cells 
alo expres jr but at much ler level thas ionacytes. In people with the disease cystic fibrosis, CPTRis 
missing or defective, leading to thickening of the mucus, clogging of the airway, and repeated infections 
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out of airway cells, causing water to flow out 
and so thinning the airway'sining of mucus 
(Fig. 1). When CFTR is absent o inactive, as 
in people who have cystic fibrosis, the mucus 
thickens and accumulates, causing airway 
obstruction and repeated infections and 
inflammation". Determining which lung 
cells express CFTR and are directly affected 
people with cystic fibrosis has been dif 
cult because expression ofthis gene seems to 
be complex and variable along the airways”. 
“Thecurtent papers demonstrate thatthe gene’ 
expression is not as random asithad seemed: 
the bulk of the CFTR mRNA detected was 
from the rare pulmonary ionacytes, each of 
which highly express the gene. People with 
tic fibrosis can also experience gastro- 
‘testinal symptoms and infertility. Perhaps 
CCETR-expressingionocytes willbe discovered 
in organs involved in these problems, too, 
‘Mice harbouring mutations in Cfirdo not 
develop cystic fibrosis — a curious fact that 
has long hampered research into the disease. 
‘Montoro et al. found that cultured airway epi- 
thelial cells generated feom Fosil-nutant mice 
have low Cjirexpression but, paradoxically, 
higher than normal Cite activity. Differences 
in pulmonary ionocytes between mice and 
humans, such as compensatory expression of 
another chloride channel when Ctr expression 
islostin mice, might explain both this paradox 
and why Cfr-mmutant mice do not model the 
disease 
Although these results suggest that 
Jonocytes have a key role in airway biology 
and cystic fibrosis, much work is still needed 
to define their physiological functions, the 
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role of CFTR in these functions, and how 
loss of CFTR causes or contributes to disease 
symptoms, Developing methods to genetically 
and pharmacologically manipulate ionocytes 
‘or replace them in model systems, and ulti 

‘mately in patients, isanother priority, 

‘These papers provide excellent examples of 
how scRNAseqcan transform long-established 
Views ofa tissue and a human disease. As 
seRNAseq tools improve and costs continue to 
drop, we wil probably soon witness something 
similar for many human organsand diseases. © 
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A practical route to 
3D molecular diversity 


Gycloaddition reactions are powerful tools for synthesizing three-dimensional 
molecules, but their scope has been limited. A creative solution to this problem 
‘opens up opportunities for drug discovery. SEE LETTER #350 


WENBO YE & ANG LI 
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for making libraries of compounds in drug, 
discovery programs’. On page 350, Chen 
etal reporta strategy that combines cyclo- 
additions with another typeof reaction, known 
as carbon-carbon cross coupling, to enable the 
modular and programmable preparation of 
cycloaddition-derived molecules. 
Carbon-carbon (C-C) cross-coupl 
reactions are often used to form bonds 
between carbon atoms that are already part of 
a carbon-carbon double bond; such carbon 
atoms are said to have sp” orbital hybridiza 
tion. Cross couplings between sp*carbonslend 
themselves to the modular synthetic routes 


ae 
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Figure L| A powerful strategy for making 3D molecules. Chen etal repott a 
practically simple procedure that allows accesto products that cannot be made 
‘asi and directly using reactions known ascycloadltions.a, An anhydride 
stating material undergoes cycloaddition to form 3D seafold, The ring 
represents several diferent ring structures, A desymmetrization reaction 
then formsan intermediate that contains carboxylic acid (CO,H),in which 


used to make compound libraries, This has 
resulted in the predomination of sp™-rch struc- 
tures — which tend to be two-dimensional — 
in compounds tested for drug discovery. But 
3D molecular structures can interact with bio- 
logical targets in different ways from 2D ones, 
so are just as important for drug discovery. 
Three-dimensional molecules tend tobe rich 
inp’ carbons, which havea different orbital 
hybridization from sp? carbons, and have the 
capacity to form four single bonds 

‘A cycloaddition known as the Diels-Alder 
reaction provides one of the most efficient 
means of building sp!-rich ring systems. 
However, to achieve high reaction yields, 
stereoselectivity and regioselectivity (a pref: 
erence to react at particular atoms) in Diels 
Alder reactions, the electronic properties of 
the reactants should match, For example, in 
conventional Diels-Alder reactions, one of 
the reactants (known asa diene) should be 
electron-rich, whereas the other (the dieno: 
phile) should be electron-poor. This drastically 
reduces the number and diversity of Diels 
Alder products that can be made. Chemi: 
stry students excitedly studying Diels-Alder 
reactionsare often frustrated when they real- 
ize the restrictions involved, The constraints 
also prevent these reactions from being used 
in modular synthetic routes. Diels-Alder 
reactions have therefore been used much less 
often for drug discovery than have cross-cou- 
pling reactions’, and so their advantages for 
conteolling the stereochemistry (the geomet- 
ricarrangement of groups) ofring-containing 
molecules have not been fully exploited by 
‘medicinal chemists 

‘Workers from the same group as Chen et al. 
previously developed a type of cross-coupling 
reaction known as radical cross-coupling 
(RCC), This is useful tool for converting 
carboxylic acids (compounds that contain 
CO,H groups) into products that contain alkyl, 
alkenyl, alkynyl or aryl groups (hydrocarbon 
groups that represent all the possible bonding 
geometries of carbon atoms). Inthe current 
paper, Chen and colleagues use carboxylic 
acids as a link that allows them to combine 
RCC and Diels-Alder reactions. The connec- 
tion can be made because compounds known 
asanhydrides and esters serveas electronically 
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favourable dienophiles in Diels-Alder 
reactions, and can then be converted into acids 
to take part in various RCC reactions 

(On the basis of this strategy, the authors 
devised a simple, modular, five-step sequence 
to generate molecules that have 3D structural 
‘complexity (Fig 1). In the first step, a Diels 
‘Alder reaction involvingan anhydride or ester 
builds 3D molecular scaffold. Thisisfollowed 
bya idesymmetrization’ reaction", which gen- 
erates a carboxylic acid and sets the absolute 
stereochemistry in the resulting product. In 
the third step, an RCC reaction replaces the 
acid group with a molecular appendage, pro- 
dducingan intermediate thats then hydrolysed 
to generatea second carboxylic acid, This is 
used in the final step: another RCC reaction, 
which introduces a second appendage. 

‘The two RCC steps allow Diels~Alder-type 
products to be made that couldn't be syn 
thesized directly in a Diels-Alder reaction 
because the starting materials would have 
mismatched electronic properties. Another 
remarkable feature of this sequence is the 
clever mechanism of stereocontrok the frst 
appendage is attached to the molecular 
scaffold at an orientation thatis governed by 
‘a nearby group produced during the desym 
metrization reaction, and the orientation of the 
second appendage is governed by the orien- 
tation ofthe first. The final product therefore 
comprises mostly one isomer in which the 
two appendages are fixed in what is known as 
‘a frays orientation to each other, 

Chen et al. went on to extend this chem- 
istry from Diels-Alder reactions to three 
other types of cycloaddition reaction t hat 
construct rings formed of three, four or five 
atoms. Moreover, one of the RCC steps could 
be replaced with a reaction that allowed the 
formation ofa carbon-nitrogen bond, rather 
than a C-C bond. The researchers suggest 
that bonds from carbon to other types of atom 
could also be made, to produce an even more 
structurally diverse set of final products 

‘The authors showcase their chemistry by 
using it to make natural products, pharma- 
<euticals and key intermediates used in the 
synthesis of such compounds. An example 
is the antipsychotic drug asenapine, which 
is usually manufactured asa mixture of two 
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the absolute stereochemistry (the geometric arrangement of groups) is fied. 
Solid wedge bands project above the plane of the page: Ris methyl, benzyl 
orCH,CH.Si(CH,), and Siissiicon.c, The acid isreplaced bya molecular 
appendage (ina radical cross-coupling (RCC) reaction. Broken wedges 
projectbelow the plane ofthe page. de, A second acid group is 

and s converted toa different appendage (R’)in anther RCC reaction () 


generated (@), 


mirror-image isomers (enantiomers), One of 
these is more effectiveasa drug than the other, 
bbutis difficult to synthesize as a single enanti 
‘omer, Starting from a symmetrical anhydride 
molecule, Chen et al show that their strat 
egy can be used to make this enantiomer in a 
practically simple, short synthetic route and in 
‘good overall yield, offering several advantages 
‘over the previously reported synthesis" 

Chen and colleagues’ chemistry is easy to 
carry out in the laboratory. Our frustrated 
chemistry students can therefore immediately 
start to synthesize compounds that fall out 
side the conventional scope of Diels-Alder 
reactions described in their textbooks, Allthey 
have todo isto collect theanhydsides reported. 
by the authors, the organic catalyst needed for 
the desymmetrization, the metal catalysts and 
the activating reagent used for RCC, and vari 
‘ous other commercially available molecular 
building blocks (such as boronic acids). Some 
of these items will be used in every reaction 
sequence, and so the task would become even, 
easier ifa chemical-supplies company could 
‘market a kit that contained these items, 

Chen and colleagues’ work will redefine 
how chemists think about synthesizing Diels- 
Alder-type products, and will find numerous 
applications for synthesizing molecules ich in 
sp carbons for drug discovery. The remaining 
challenges mainly concern the limitations of 
RCC. For example, the method can currently 
produce only a trans arrangement of the wo 
appendages introduced in the RCC steps; to 
‘make the other arrangement, thestereachem 
ical outcome of the second RCC reaction 
‘would need to be changed. Ifsuch issues can 
beaddressed, then Chen and colleagues reac: 
tons will become even more powerful tools for 
synthesis than they already are. 


‘Wenbo Yeard Ang Liare at the State 
Key Laboratory of Bioorganic and Natural 
Products Chemistry, Shanghal Institute of 
Organic Chemistry, Chinese Academy of 
‘Sciences, Shanghai 200032, China, 

e-mail: ali@sioe.acen 


1, Nishiwaki N. (ed) Methods and Appleatons af 
Cycoadetin Resctions in Organ Syheses (Wiley, 
2014), 

2, Nicolaou. C, Snyder, S.A, Montagnon,T & 


yf reseanch EISERISIS 


50 Years Ago 


‘Whatever advantages the distant 
future may offer from the manned 
exploration of space, itis clar that 
the presentbenefits ofthe space 
raceare few and far between 
‘Alortnights conferenceis being 
hil. under the tile"Space 
Science and Technology —Benefts 
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Special relativity 
validated by neutrinos 


to Developing Countries: The 
introductory pamphlet ... describes Neutrinos are tiny, ghost-like particles that habitually change identity. A 
ina style of sustained optimisin measurement of the rate of change in high-energy neutrinos racing through 
the cornucopia of technological Earth provides a record-breaking test of Einstein's special theory of relativity 
blessings which future satellite 
systems will rain down on the poor 
and rich alike. In the near future, MATTHEW MEWES 
reflector satellites will shed light on a) Moceinalaoa 1B) 
the night earth, the pamphlet says, he existence of extremely light, electri- Ammaspnere 
ancl by proilding Tilo fo cally neneral panies called nenerinne f 
construction, lumbering, fishing \was ist postulated in 1930 to explain 
and other outdoor industries, could an apparent violation of energy conservation 
conceivably havea important inthe decays of certain unstable atomic nul 
effect on the economic growth Writing in Nature Pysics, the leeCube Colabo- 
ofthe developing nations The ration’ now uses neutrinos seen in the world’ 
pamphlet does not diseuss the largest particle detector to scrutinize another 
catastrophic effect of such satellite cornerstone of physics: Lorentz invariance. 
‘on iological shythms, nor doesit “Thisprincple states that the laws of physis are 
explain in what manner night-time independent ofthe speed and orientation ofthe 
fishing and lumbering will boost experimenter frame of reference, and serves 
any nation’ economy. as the mathematical foundation for Albert 
From Nature 17 August 1968 Einsteins special theory of relativity Scouring 
their data for signs oflbraken Lorentz invariance, 
theauthors cary out one ofthe most stringent Tau neurite. Detector 
100 testsof special relativity sofa, and demonstrate 
ears Ago how thepeculiasitiesofneutrinos can beused to Figuret | Propagation of neutrinos through 
probe the foundations of modern physics. Earth. ‘There are three known types of neutrino: 
‘The entrance of the United States Physicists generally assume that Lorentz electron, muon and tau.a, A muon neutrino 
of America into the war has invariance holds exactly. However, in the late Produced in Earths atmosphere can be thought of 
‘prompted Mz A, Hansen to write 1980s, the principle began tobesystematically24!H€ combination oftwo quantum-mechanc 
to Science pointing out that the challenged, largely because ofthe possibility "aves(fedand blue that arein phase the 
peaks ofthe waves are observed atthe wae time. 
States possess no national floral thatit was broken slightly in proposed theories ff" cic noon as Loneus variance nee 
‘emblem. France has its fleur-de- of fundamental physics, suchas string theory’. _iolated, these waves could travels different 
lis, England the rose, Scotland Over the past two decades, researchers have Speeds through Earhsinteio and be dtecicd 
the thistle, but America has no tested Lorentz invariance in objects ranging inthe oof phasetau-nevtrina state b, The 
flower with which tisassociatedin fom photonsto the Moon’ leeCihe Collaboration’ reports no eidence of 
peoples minds. Mr. Hansen points The IceCube Collaboration instead tested such conversion, constraining the extent o which 
‘out the various characteristics the principle using neutrinos. Neutrinos inter- Lorentz invariance could be violated. 
required fora national lower, act with matter through the weak force — ane 
and comes tothe conclusion that ofthe fous fundamental forces ofnature. The ‘The leeCube Neutrino Observatory. located 
the columbine, which sin lower iniluence ofthe weak forces limited to min- at the South Pole, eemedies this problem by 
from Apri to July. is pobably the ute distances, Asa resulinteractionsetween monitoring an immense target volume to 
‘most suitable forthe purpose. The neutrinos and matter are extremely improb- glimpse the exceedingly rare interactions. At 
correspondence ofthe generic able, anda neutrinocan easy averse through the heart ofthe detector are moze than 5,000 
‘name Aquilegia with the Latin the entire Earth unimpeded, This posesachal- light sensors, which are focused on I cubic 
ame ofthe eagle is also considered lenge for physicists trying study these elusive kilometre (1 billion tonnes) ofice,The sensors 
tobe a point inits favour, particles, because almost every neutrino will constantly look forthe telltale Hashes of light 
From Nature 15 August 1918 simply passthrough any detector completely that ate produced when a neutrino collides 
unnoticed with a particle inthe ice, 
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‘The main goal of the IeeCube Neutrino 
Observatory is to observe comparatively 
scarce neutrinos that are produced during 
some of the Universe's most violent astro: 
physical events, However, ints estof Lorentz 
invariance, the collaboration studied more 
abundant neutrinos that are generated when 
fast-moving charged particles from space col- 
lide with atoms in Earths atmosphere. There 
are three known types of neutrino: electron, 
‘muon and tau. Mostof the neutrinos produced 
inthe atmosphere are muon neutrinos. 

‘Atmospheric neutrinos generated around 
the globe travel freely to the South Pole, but 
can change type along the way. Such changes 
stem from the fact that electron, muon and tau 
neutrinos are not particles in the usual sense. 
‘They are actually quantum combinations of 
three“real particles —v,,vsand v,—that have 
tiny but different masses, 

Ina simple approximation relevant to the 
IceCube experiment, thebirth ofa muon neu- 
trino in the atmosphere can be thought of as 
the simultaneous production of two quantum- 
‘mechanical waves: one for Vs and one for v, 
(Fig. 1). These waves are observed asa muon 
neutrino only because they are in phase, which 
‘means the peaks of the two waves are seen at 
the same time. By contrast, atau neutrino 
results from out-of-phase waves, whereby the 
peak of one wave arrives with the valley of the 
other. 

If neutrinos were massless and Lorentz 
invariance held exactly, the two waves would 
simply travel in unison, always maintaining 
the in-phase muon-neutrino state, How: 
ever, small differences in the masses of Vs 
and v; or broken Lorentz invariance could 
cause the waves to travel at slightly different 
speeds, leading to a gradual shift from the 
‘muon-neutrino stato the out-of-phase tau- 
neutrino state. Such transitions are known 
as neutrino oscillations and enable the Ice 
Cube detector to pick out potential violations 
of Lorentz invariance, Oscillations resulting 
from mass differences are expected to be neg 
ligible at the neutrino energies considered in 
the authors’ analysis, so the observation of an 
oscillation would signal apossible breakdown 
of special relativity 

The leeCube Collaboration is not the first 
group to seek Lorentz-invariance violation in 
neutrino oscillations". However, two key fac- 
tors allowed the authors to carry out the most 
precise search so far First, atmospheric neu- 
trinos that are produced on the opposite side 
of Earth to the detector travel a large distance 
(almost 13,000 km) before being observed, 
‘maximizing the probability that a potential 
oscillation will occur Second, the large size of 
the detector allows neutrinos to be observed 
that have much higher energies than those that 
canbe seen in other experiments. 

Such high energies imply that the quantumn- 
mechanical waves have tiny wavelengths, 
down toless than one-billionth of the width 
ofan atom, The IceCube Collaboration saw no 


sign of oscillations, and therefore inferred that 
the peaks of the waves associated with v, and 
vate shifted by no more than this distance 
after travelling the diameter of Earth, Conse- 
‘quently, the speeds of the waves differ by no 
sore than a few parts per 10" —a result that 
is one ofthe most precise speed comparisons 
inhistory 

‘The authors’ analysis provides support for 
special relativity and places tight constraints 
‘on a number of different classes of Lorent2- 
invariance violation, many for the frst time. 
Although already impressive, the IceCube 
‘experiment has yet to reach its fll potential 
Because oflimited data the authors restricted 
their attention to violations that are independ: 
ent ofthe direction of neutrino propagation, 
neglecting possible irection-dependent viol 
tions that could arise more generally. 

‘With a greater number of neutrino detec- 
tions, the experiment, or alarger future ver- 
sion’, could search for direction-dependent 
violations. Eventually similar studies involving 
more-energetic astrophysical neutrinos propa 
gating over astronomical distances could test 


GENOME EDITING 


NEWS & VIEWS [ESS 


the foundations of physics at unprecedented 
levels. 
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Chromosomes 


get together 


Genome-editing approaches have been used to fuse 16 yeast chromosomes to 
produce yeast strains with only 1 or 2 chromosomes. Surprisingly, this fusion has 
little effect on cel fitness. SEE ARTICLE P.331 & LETTER #392 


GIANNI LITI 


he genomes of nucleus-bearing organ 

isms are divided into linear chromo- 

somes, The number of chromosomes 
ranges from one to hundreds across species, 
But why is there such variation? Do specific 
chromosome numbers hold an advantage for 
particular species? Shao eal (page 331) and 
Luo et al (page 92) independently manipu 
late the genome of the budding yeast Saccha- 
romyees cerevisiae by systematically fusing 
chromosomes, enabling the researchers to 
‘explore the consequences of chromosome- 
number reduction. 

Normal S, cerevisiae genomes have 
16 distinct chromosomes (n= 16), which range 
from 230 to 1,532 kilobases in length’. To 
function correctly, yeast chromosomes need 
protective structures called telomeres at both 
ends, and only one centromere — aegion that 
‘ensures the accurate segregation of chromo: 
somes into mother and daughter cells during 
cell division. Simply fusing the ends of two 
chromosomes is therefore nota viable strategy 
for reducing chromosome number because it 
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‘would produce chromosomes containing two 
centromeres 

“To solve this problem, the two groups used 
_genome-editing tools to fuse sequences found, 
adjacent to one ofthe telomeres in each chro: 
‘mosome, and to simultaneously remove one 
of the two centromeres (Fig. 1). Using this, 
approach, they reduced the chromosome 
‘number step by step, producing strains that 
aad progressively lower values of. The fusion 
strains comprised genomic material that is 
almost dential to that of normal S.cerevisia, 
differing only in chromosome number and by 
a few non-essential genes that were deleted 
<during strain creation, 

Luo et al. produced an n=2 strain con. 
taining chromosomes that were each about 
6,000 kb long, However, they were unable to 
fuse the two chromosomes into one as part of 
viable cell By contrast, Shao eta. success 
fully fused the entire 8. cerevisiae genome into 
single chromosome in a functional yeast 

Given that each group used similar strat 
egies, itis interesting to consider why only 
one of the teams could fuse the final two 
chromosomes. A possible explanation is that 
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Figure | Fusing chromosomes oneby one. Two groups" have fased ll 16 chromosomes ( 


6) 


of the budding yeast Saccharomyces cerevisiae to praduce stains that have only one’ orto" long 
chromosomes. In. cerevisiae, each chromosome must have protective structures known as telomeres 
tbo ends, as wells a single structre called centromere thats essential for normal chromosome 
‘egregetion during cell division, To generate fused chromosomes with this composition, the groups used 
‘genome-eiting techniques to cleave sequences found next to one of the telomeres in each chromosome, 
snd simultaneously moved one of the two centromere (just one cleavage siti sufficient to trigger 
removal ofthe entre centromere). They then fused the cleaved portions. By repeating this process, they 
reduced the chromosome number in stepwise manner, producing yeas cll that have progressively 


lower values of 


the groups fused the yeast chromosomes in 
different orders and orientations. ethaps such 
factors matter, which could mean that only 
certain final genome structures are attain- 
able. In the future, educing the chromosome 
number through a variety of fusion paths 
might reveal how chromosomal structures 
affect cell viability. Another possibility is that 
‘mutations introduced accidentally during the 
chromosome-fusion experiments affected cell, 
tolerance tothe new genome organization, 

Both groups then investigated the biological 
implications of chromosome fusion. Over: 
all, organismal traits such as cell growth, size 
and shape seem to be buffered throughout the 
series of fusions. Notably, the expression of only 
afew genes was altered considerably in either 
the n=2 or n=1 strains. Most of the observed 
increases in gene expression can be explained 
by there being fewer genes located near teloms- 
eres, which promote transcriptional silencing’ 

Such transcriptional stability isin contrast 
to the widespread transcriptional variation 
thatis seen when yeast undergoes ather types 
of chromosomal modification, such as inver- 
sions of particular regions’. Shao et al. show 
that this stability reflects the fact that there are 
‘only modest changes to the intrachromosomal 
interactions that usually take place, which 
can modulate gene expression. However, the 
interchromosomal-interaction landscape 
changes drastically, owing to the depletion of 
centromeres, which drive the 3D configuration 
ofthe yeast genome" 

"The yeast strains generated by the groups are 
haploid — they contain only one copy of each 
chromosome. Haploid yeast reproduce asex- 
ually, but they can also mate through sexual 
reproduction to form diploid yeast, which con. 
tain two copies ofeach chromosome. Diploid. 
yeast can then divide through a process called 


imeiosisto produce haploid spores that mature 
into haploid cells. The groups showed that the 

Vand n=2 strains can undergo sexual 
reproduction, albeit with reduced efficiency 
compared with wild-type yeast, and produce 
spores that are sight less viable. 

During meiosis, genetic material is 
exchanged between paired chromosomes in 
a process called recombination. Because the 
genomes ofall cells from a given fusion strain 
are identical, they lack the genetic variability 
that researchers need to map recombination 
through the generations. As such, the two 
groups could not characterize how chromo- 
some reduction affects recombination. The 
high spore viability ofeach fusion strain indi- 
cates that some recombination might occur, 
ensuring proper chromosome segregation. 
However, the greatly reduced chromosome 
number essentially eliminates any risk of 
smis-segregation. 

Luo ef a. mated strains that had different 
chromosome numbers, and then investigated 
spore viability and production inthe resulting 
hybrid strains, to determine at what point the 
fusion strain could no longer produce viable 
spores (a phenomenon called reproductive 
isolation). As predicted’, an increasing dfter- 
ence in chromosome number had an increas- 
ing effect on spore viability until, in hybrids 
generated by crossing haploid strains that have 
n=I6and n=8, none ofthe spores produced 
were viable. Moreover, spore production was 
arrested when the difference in chromosome 
‘number became any larger. 

This is unexpected, especially given that 
diploid hybrids that are sterile because of high 
sequence divergence or differently arranged 
genomes between their two sets of chromo- 
somes can progress efficiently through 
reiosis, despite producing inviable spores’. 
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“The mechanism that underlies the reproductive 
isolation seen by Luo and colleagues remains 
tobe determined, Future work using synthetic 
‘genomes, which can be edited at the single- 
rnucleotide level, will allow the introduction 
‘of genetic variants on both local and genome: 
Wide scales, enabling the in-depth, systematic 
analysis ofthe factors that prevent species fom 
breeding, as wellas the genomic changes that 
‘prompt reproductive isolation. 

Both studies concluded that reduced chro- 
‘mosome number causes no major growth 
defects when cells are grown under various 
conditions and stresses. Small fitness defects 
‘were mostevidentin the n= strain, consistent 
With the fact that this chromosome configura 
tion is challenging to obtain. Although these 
fitness differences seem mild ina laboratory 
setting, they could become more harmful in 
the natural environment. Indeed, Shao and 
colleagues’ n=1 strain was quickly outcom- 
peted by a normal strain ofS. cerevisiae when 
the two were cultured together. Thisis consist- 
ent with the idea thatthe structure ofS. cerevi- 
siae chromosomes has remained highly stable 
for several milion years, although reductions 
in chromosome number through telomere 
fusion and centromere loss occurred repeat 
edly over longer evolutionary timescales" 

‘The short generation time ofS, cerevisiae 
‘means that, in the future, the evolution of 
strains that havea reduced chromosome num- 
ber could be tracked in the lab, in long-term 
experiments that run for months or years. Such 
experiments will enable researchers to map 
adaptive changes that restore fitness in strains 
that havea reduced number of chromosomes, 
and to accurately measure genome stability in 
these yeast. 

Beyond the current findings, these 
engineered yeast strains constitute powerful 
resources for studying fundamental concepts 
in chromosome biology. including replication, 
recombination and segregation, The chromo- 
some-engineering approach might also be 
applicable to organisms that have more-com: 
plex genomes, However, the presence of highly 
‘complex DNA sequences in the regions that 
surround telomeres and centromeres in these 
‘organisms will make this challenging task. m 
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A revised airway epithelial hierarchy 
includes CFTR-expressing ionocytes 


Daniel T. Montoro!2-%44, Adam L, Haber‘, Moshe Biton‘'2", Vadimir Vinarsky'=*, Brian Lin!=, Susan E. Birket®”, 
Feng Yuan’, Sijia Chen’, iui Min Leung!" jorge Villoria™, Noga Rogel, Grace Burgin‘, Alexander M. Tsankov', 
Avinash Waghray'*"“4, Michal Slyper', Julia Waldman, Lan Nguyen*, Danielle Dionne’, Orit Rozenblatt-Rosent, 
Purushothama Rao Tata", Hongmei Mou”, Manjunatha Shivaraju'?, Hermann Bihler'®, Martin Mense™, 
Guillermo J. Tearney""", Steven M. Rowe”, John F. Engelhardt®, Aviv Regev'!?* & layaraj Rajagopal’ ™* 


‘The airways of the lung are the primary sites of disease in asthma and eystic fibrosis. Here we study the cellular 
composition and hierarchy of the mouse tracheal epithelium by single-cell RNA-sequencing (scRNA-seq) and in vivo 
lineage tracing. We identify a rare cell type, the Foxil* pulmonary ionocyte; functional variations in club cells based on 
their location; a distinct cell type in high turnover squamous epithelial structures that we term ‘hillocks’; and disease- 
relevant subsets of tuft and goblet cells. We developed ‘pulse-seq’, combining scRNA-seq and lineage tracing, to show 
that tuft, neuroendocrine and ionocyte cells are continually and directly replenished by basal progenitor cells. lonocytes 
are the major source of transcripts of the cystic fibrosis transmembrane conductance regulator in both mouse (Cjtr) and 
human (CFTR). Knockout of Faxil in mouse ionocytes causes loss of Cftr expression and disrupts airway fluid and mucus 
physiology, phenotypes that are characteristic of cystic fibrosis. By associating cell-type-specific expression programs 


With key disease genes, we establish a new cellular narrative for airways disease, 


‘The airways conduct oxygen from the atmosphere to the distal gas 
exchanging alveoli and are the loci of major diseases, including asthma, 
chronic obstructive pulmonary disease and cystic fibrosis. The pre 
dominant airway epithelial cell types include basal progenitor cells, 
secretory club cells and ciliated cells!. Rare cell types such as solitary 
neuroendocrine, goblet and tuft cells have received less scrutiny, and 
their lineage relationships and functions remain poorly understood. OF 
note, diseases ofthe airway occur at distinct proximodistal sites along 
the respiratory tee. This phenomenon has been attributed to physical 
factors governing the localized deposition of inhaled particulates, tox- 
ins, smoke and allergens”. Whether disease heterogeneity also reflects 
cellular heterogeneity, which varies along the airway tree, is unknown. 
scRNA-seq studies™* have begun to delineate cell type diversity and 
lineage hierarchy in the lung 

Here we combine massively parallel seRNA-seq (also performed. 
inthe accompanying Letter’) and in vivo lineage tracing in the adult 
‘mouse tracheal epithelium. The resulting finer taxonomy highlights 
new cell types and subtypes, reveals new tissue structures and refines 
lineage relations. These findings reframe our understanding of both 
Mendelian and complex multigenic airway diseases, including cystic 
fibrosis and asthma, 


scRNA-seq reveals new disease-associated cell types 
We initially profiled 7.494 EpCAM™ tracheal epithelial cells 
from C57BL/6 wild-type mice (n= 4) and Foxj1-EGEP ciliated cell 
reporter mice (n=2), using complementary scRNA-seq approaches: 


‘massively parallel droplet-based 3” scRNA-seq (K=7, 193 cells) and 
full-length scRNA-seq (k~=301 cells: Fig, 1a, Extended Data Figs. 1,2) 
We partitioned the cells profiled by 3” scRNA-seq into seven dis. 
tinct clusters annotated post hoc by expression of known marker 
genes (Fig 1b, Extended Data Fig. 1). Each cluster mapped to known 
abundant (basal, club, ciliated) or rare (tuft, neuroendocrine, goblet) 
epithelial cell types, except for one cluster (Fig, 1b) that contained 
cells with expression profiles similar to those of fonocytes found in 
the skin of Xenopus and zebrafish". We also recovered matching 
clusters using full-length scRNA-seq of 301 EpCAM"CD45~ epithelial 
cells from proximal and distal tracheal segments of C57BL/6 wild-type 
‘mice, with the exception of goblet cells (n=3; Fig La, Extended Data 
Figs. 2, 3a,). 

‘We identified new consensus markers (Extended Data Figs. If, 3b, 
Supplementary Tables 1-3) and cell-type-specific transcription factors 
(false discovery rate, FDR <0.01, likelihood-rati test (LRT), Extended 
Data Fig. 3c, Supplementary Table 4). To our knowledge, Nfiais the 
first transcription factor that is known tobe enriched in cub cells. Nfia 
regulates Notch signalling, which is known to be required for club cell 
maintenance!" Ascll, Ascl2 and Ascl3, also associated with Notch 
signalling, ae enriched in the rare solitary neuroendocrine cells, 
tufts cells and ionocytes, respectively (FDR < 0.0001, LRT). Goblet cells 
specifically express Foxq/, which is essential for mucin expression in 
gastric epithelia 

Some cell-type-specific markers, including Cahr3 (ciliated cells) 
and Rps13 (tuft cells) have been identified as risk genes for asthma in 
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Fig. 1 | A single-cell expression atlas of mouse tracheal epithelial cells. 
a, Overview ofthe analysis. b, -distributed stochastic neighbour 
embedding (?-SNE) of 7193 3 scRNA-seq profiles, coloured by luster 
fssignment and annotated post hac. The ionocyte cluster is circled. PNEC, 
Pulmonary neurendocrine cll 


genome-wide association studies (GWAS)" (Extended Data Fig. 3d-f). 
Cr3 encodes a thinovirus receptor and is associated with exacerba 
tions in severe childhood asthma", suggesting that rhinovirus infec- 
tion specifically of iiated cells may precipitate exacerbations in some 
individuals. Rgs13 is associated with asthma and IgE-mediated mast 
cell degranulation" ts specific expression in tuft cells implicates these 
cellsas participants in asthmatic inflammation, 

‘Mucous metaplasia (an excess of mucus-producing goblet cells) 
coceurs more prominently in distal than in proximal mouse tracheal ep: 
thelium following allergen exposure" Some cel-type-specific expres 
sion programs also vary along the proximodistal axis ofthe airway 
twee, OF 105 genes that ate differentially expressed (FDR < 0.05, Mann— 
‘Whitney Utes) between distal and proximal club cells (Extended Data 
Fig. 4, Supplementary Table 5), distally enriched MucSb", Notch2" 
and Il13ra1" are known to have roles in mucus metaplasia. Indeed, 
TL-13-induced mucous metaplasia in cultured epithelia resulted in 
greater goblet cell differentiation in distal epithelium (Fig. 2, Extended 
Data Fig 4b) 


Acell population organized in ‘hillocks’ 

Cellular differentiation during homeostasis is an ongoing, asynchro- 
nous process. We inferred trajectories of cell differentiation using 
diffusion maps (Fig. 2b, c, Extended Data Fig 4c), and characterized 
expression programs and transcription factors that vary coherently in 
transitional cells that were pseudotemporally ordered along trajcto- 
res that connect basal, club and ciliated cells (Extended Data Fig. 5, 
Supplementary Table 7, Methods). One ofthese trajectories reflects the 
known basal-to-club cll lineage path (DC1-DC2, k=555 cells), buta 
second, distinct trajectory connects basa to club cells through a newly 
identified transitional cell (DC2-DC3, k= 1,908 cells) that uniquely 
expresses squamous epithelial markers Krt4 and Krt13 (FDR < 10°, 
LRT; Fig. 2b, c). The basal cell differentiation marker Krt8 does not 
distinguish the two paths that culminate in club cells (Extended Data 
Fig. 4c). We did not detect any cells transitioning from basal to ciliated 
cells (Fig. b,c), consistent with the homeostatic production of ciliated 
cells from club cells'*. 

Surprisingly, many Krt13* cells ae located in contiguous groups 
of stratified cells that lack luminal ciliated cells (Fig. 2d, e). Instead, 
luminal cells are Scgbla1“Krt13* club cells that lay atop Txp63°Krt13" 
basal cells. Graded ‘Trp63 expression extends from basal to supraba- 
sal strata (Extended Data Fig. 4d). We term these unique structures 
‘hillocks! Labelling with 5-ethynyl-2'-deoxyuridine (EdU) was more 
concentrated in hillocks than in normal pseudostratified epithelium, 
indicating that hllocks are distinct zones of high turnover (Extended 
Data Fig, 4e,f), We generated Seghal-creER/LSL-tdTomato mice to 
label all ub cells, including hillock club cell. The fraction of labelled 
hillock club cells diminished with homeostatic turnover (Extended 
Data Fig. 4g), supporting a model in which Trp63"Krtl3" basal cells 
rapidly give rise to hillock cub cells. 


Fig. 2 | Club cell differentiation varies by location a Distal epithelia 
preferentially give rise to mucous metaplasia. Immunofluarescence 
showing cells positive for acetylated tubulin (AcTub ciliated cells) and 
‘MucSac (goblet ces) in cultured epithelia from proximal (top panels) or 
distal (bottom panels) trachea stimulated with recombinant IL-13 (right) 
versus contral (lft. Scale bar, 200m. b,c, Differentiation trajectories. 


Diffusion map embedding (b) of 6.905 basal (blue), club (green) and 
ciliated (red) cells coloured by cluster assignment (top) or expression 
(log,(TPM+1), colourbar) of kr113 (bottom). , Number of individual 


cells associated with each trajectory de, Krt13" cells occur in hillock 
structures. d, Whole-mount stain of Krt13 (magenta) and acetylated 
tubulin (green), n™=3 mice. Scale bar: 500 im (main), 50m (expanded 
inset) Schematic of squamous hillocks within pseudostratiied cited 
epithelium, 


Hillock cells express regulators of cellular adhesion and squamous 
epithelial differentiation (Eem, $100a1 and Cldn3), and yenes asso- 

cated with immunomodulation and asthma (Lgals3 and Anxal®; FDR 
<10-" LRT; Extended Data Fig. 4h, i, Supplementary Table 6). Overall, 

hillocks are characterized by rapid cellular turnover, squamous barrier 
function and immunomodulation, 


Lineage tracing coupled to cellular dynamics 
‘To monitor the generation of differentiated cell types, we developed 
‘pulse-seq’ a novel assay that couples scRNA-seq and in vivo genetic 
lineage tracing overtime (Extended Data Fig 6a). We generated induc- 
ible Krt5-creER/LSL-iT/nvG mice to label basal cells and thei progeny 
with membrane-localized EGFP (mG), whereas non-lineage-labelled 
cells express membrane-localized tdTomato (niT). Following induction 
with tamoxifen, we profiled 66.265 mG" (Supplementary Fig. 1) and 
if cells by scRNA-seq at days O, 30 and 60 of homeostatic turnover 
{(n=9 mice; thee per time point). We identified the seven epithelial cell 
types and a population of proliferating cells, which were predominantly 
basal cells (Fg. 3a, Extended Data Fig. 6b). We calculated the fraction 
of lineage-labelled cells of each cell type at each time point (Fig. 3b, c) 
and estimated the daily labelling rate of each by quantile regression 
(Fig. 3d, Extended Data Fig. 6c, Methods). We then interpreted these 
data inthe context of prior basal cell lineage traces in which club cells 
are labelled before ciliated cells“, consistent with lub cell being the 
direct parents of ciliated cells at homeostasis 

Initially, basal cells were specifically labelled (64.2%) with only infre- 
«quent labelling of rare cell types (<1.8%) and club cells (3.3%, n=3 
‘mice, Fig. 3b, c). Labelled club cells reflect a small population of transi- 
‘tional basal cells as they convert from a basal to club cll fate (Extended 
Data Fig7¢,). The fraction of labelled cells among basal cells remains 
unchanged over time, consistent with self-renewal (Extended Data 
Fig. 6d). By contrast, the lineage-labelled fractions of tuft cells, neuro: 
endocrine cellsand ionocytes substantially increased (Fig, 3c), consist- 
ent with ongoing turnover. Rare-cell-type fractional lineage labelling 
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Fig. | Tracking differentiation dynamics with pulse-seq. a,b, Pulse-seq 
tracks the sequential lineage labelling ofeach cell ype. -SNE visualization 
(066,265 cells coloured by cluster assignment (a) or lineage label over 
time (b).mif, membrane tdTomato; mG, membrane EGFP. mG: 
lineage-labeled percentages ofeach tracheal epithelial cel type. Points 
represent the percentages of lineage-Iabelled cells for individu mice; bars 
show overall estimate; =3 mice per time point (x axis) Error bars, 95% 


approximates that of club cells at day 30 and 60 (Fig. 3c, d), suggesting 
that these rare cell types, as with club cells are immediate descendants 
‘ofbasal cells, This confirms a previous suggestion that solitary neuto. 

endocrine cells are derived from basal cells”. By contrast, goblet and 
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lopoldchange Tt vs Tut 2) 


Fig. 4 | Tuft and goblet cell subtypes display unique functional gene 
expression programs. Tuft-1 and tuf-2 subclustrs, a, Relative expression 
(tow. wise Z score of logs(TPM +1) of genes that are diferentally 
expressed (FDR <0.25, LRT) between tuft cells of ach subtype. 

b, Immunoflaorescence of pan-tuft marker Tepms (blue) and tult-1 
(Gng13", green) or tuft-2 (AloxSap', magenta) specific markers (cell are 
tutlined) in vivo, DAPI, grey: n 3 mice, four replicate trachea sections 
were examined foreach mouse. Seale bar, 5jm.e, Distinct expression 
progeams in tft-1 and tuft2 cells, Differential expression in tuft cell 
Subtypes fr all genes (left), taste genes (centre) and leukotriene synthesis 
genes (right) Labelled genes ae differentially expressed (FDR <0.01, 
ERT): k= #92 cells; n—15 mice. d, mmunofluorescence validation of 
sgoblet-1 (TH, magenta) and goblet-2 (Lip, green) cells solid outlines), 
DAPI, blue; n 3 mice, four replicate trachea sections were examined for 
tach mouse, Seale bat, 10), 
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(Cl P values are indicated, LRT. PNEC, neuroendocrine; Pl, pulmonary 
fonocyte. d, Estimated daly rate of lineage labeling (Extended Data 

Fig. 62) by cell type, n—9 mice; error bars, 95% Cl, P values are indicated, 
rank test , Lineage validation in situ. Representative images of unlabelled 
(dashed outline) and basal lineage-labelled (solid outline) Gnat3* tuft 
cells Seale bar, 10m, f Cell types lineage and cellular dynamics infersed 
‘using pulse-seq, 


ciliated cells were labelled ata substantially lower rate (Fig. 34), con 
sistent with a model in which stem cells fest produce club cells that, in 
turn, later produce goblet cells and ciliated eels 

‘We confirmed our lineage model with conventional in vivo lineage 
tracing with basal and club cell drivers. Over a 30-day basal cell lineage 
trace with Krt5.creER/LSL-tdTomato mice, the proportion of lineage 
labelled tuft cells markedly increased (Fig. 3e, Extended Data Fig 6e), 
whereas club cell lineage tracing with Seghlal-creER/LSL-tdTomato 
‘mice over the same time period labelled few ofthe tuft cells,and even 
fewer ionocytes or neuroendocrine cells, indicating that basal cells 
are the predominant source of these rare cell types (Fig. 31, Extended 
Data Fig. 6{-h). We also investigated the turnover of the hillock 
club cells, identified by lub cellsub-clustering (Extended Data Fig. 7a,b). 
The fraction of labelled hillock club cells grew more rapidly than 
the fraction of total labelled club cells (Extended Data Fig. 7c, d, g), 
consistent with the rapid turnover ofhillocks, 


Distinct subsets of tuft and goblet cells 

Tuft cells express the largest number of specific G-protein-coupled 
receptors and taste receptors, consistent with a sensory function 
(FDR <0.001, LRT; Extended Data Fig 8a, b, Supplementary Table 4). 
Airway tuft cells express the alarmins 1125 and Tp (FDR < 10~*, 
Extended Data Fig. 8c), which initiate type-2 immunity in the gut, 
and possess lateral cytoplasmic extensions (Extended Data Fig. 8) that 
‘may extend their chemosensory span. 

Next, we separately re-clustered each rare cell type after aggregat: 
ing both droplet-based datasets (Figs. 1b, 3a, n= 15 mice). Tuft cells 
partitioned into three clusters: immature tuft, tuft-1 and tuft-2 cells 
(Fig 4a, Extended Data Fig 8e fi, Supplementary Table 8). Taft-1 cells 
expressed genes associated with taste transduction (P=2.07 x 10”, 
hypergeometric test), whereas tut-2 cells expressed genes that mediate 
leukotriene biosynthesis, notably AlaxSap"® (P=3.13 x 10~*, hypergeo- 
‘metric test), which are central mediators of inflammation and asthma 
(Fig. 4a-c), Asin the gut, tuft-2 cells are also enriched for immune 
cell-associated Pipre (CD45, FDR = 0.064, LRT). Both tft cell subsets 
are generated at similar rates by basal cells (Extended Data Fig. 8g), 
but canonical tuft ell transcription factors are associated with specific 
subsets of tuft cells, including Pou2f3 (tuft-1) and Gib, Spb, and Sox® 
(tuft-2, FOR <0.01, LRT, Extended Data Fig. 8h). 

The most highly enriched marker across goblet cells was 
Gp2 (Extended Data Fig. 1e, Supplementary Table 1), a marker of 
intestinal M cells associated with mucosal immunity, Goblet cells 
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Fig. 5 | The pulmonary ionocyte isa novel mouse and human. 
cpithelial cell ype that specifically expresses CPTR. a, Mouse 
pulmonary ionocyte markers. Expression level of ionocyte markers 
(EDR < 005, LRI, 3 scRNA-seq dataset) in each airway epithelial cll 
type. Sb was formerly known as Gm933.b, Immunofluorescence 
co-Labelling of EGFP (Foxit) ionocytes (solid outline), Atp6v0d2 (left) 
snd Cftr (right). DAPI, blue; x=3 mice, four replicate trachea sections 
‘were examined for each mouse. Seale bar: 10m (left), Sym (right), 

«6 FSNE plot of 66,265 pulse-seq cells and ionocyte subset (black box 
Inset) coloured by expression of tonocyte markers Fasil (left) and Chir 
(tight) d, qT-PCR confirms ionocyte enrichment of fr. Expression 
(AAC, Supplementary Table 12) of ionocyte (Cf, Fox) and ciliated cell 
(Faxj1) markers (axis) in ionocytes and ciliated cells isolated from. 
Foxil-EGEP (n=4, dots) and Foxj!-EGEP. 

Samples are normaliz 

(= 6). Error bars, 9: 


partitioned into three subsets, immature goblet, goblet-1 and goblet-2 
(Extended Data Fig. 8-1, Supplementary Table 8). Goblet-1 cells 
are enriched for the expression of genes encoding key mucosal 
proteins (1/1, Tf'2and MucSb'*, FDR <0.001, LRT) and secretory 
regulators (for example, LmanIl or P2rx4”, FDR <0.1, LRT). We 
confirmed the co-expression of T/2 and MucSac in goblet-1 cells by 
antibody staining (Extended Data Fig. 8m). Goblet-2 cells specifically 
express Depp1, Depp2 and Depp3, orthologues of ZG16B, which codes 
fora lectin-like secreted protein that aggregates bacteria, and Lip 
a secreted gastric lipase that hydrolyses triglycerides. We identified 
‘unique TA2* goblet-1 and Lipf* goblet-2 cells by immunostaining 
(Fig. 4d). 


Foxil* pulmonary ionocytes highly express Cftr 
‘We confirmed that ionocytes are a newly identified cell population in 
vivo using transgenic Faxil-EGEP reporter mice and Foxil immuno- 
reactivity. EGEP (Faxil) co-localizes with global airway markers (Sox2 
and fil), but not markers of the other cell types (Extended Data 
Fig. 9a), On average, we detected 1,038 + 501 ionocytes in the surface 
epithelium of each mouse trachea (n =3 mice, Extended Data Fi 9b), 
accounting for <1% of epithelial cells 

Pulmonary ionocytes specifically express the V-ATPase-subunit 
genes Atp6yic2 and Apév0d2 (FDR < 0.0005, LRT, Fig 5a, Extended 
Data Figs. 3b, 9, Supplementary Table 1) and are uniquely immuno- 
reactive for ATP6v0d2 (Fig 5b). This profile resembles that of Xenopus 
and zebrafish skin ionocytes, in which Foxil orthologues specify 
cell identity and regulate V-A'Pase expression*?. Mouse Foxil also 
controls he expression of V-ATPase—which is important for ion tran: 
portand fluid pH?"—in specialized cells of the inner ear, kidney and 
epididymis. Like zebrafish ionocytes", pulmonary ionocytes extend 
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«Foxit knockout decreases expression of ianacyte transcription factors 
and Cfirin air-lguid interface (ALI)-cultured epithelia. Expression 
(ACs, Supplementary Table 12) of ionocyte markers in heterozygous 
(Faxit”"”,n=4) and homozygous knockouts (Faxil-",n=6), 
normalized to wild-type littermates (n =8). Error bars, 95% Cl: P values 
ate indicated, Holm-Sidak test. f, Fxil knockout disrupts mucosal 
homeostasis in ALI-cultured epithelia. Effective viscosity (lft) and ciliary 
beat fequency (right) assayed with OCT in homezygous Faxi1(KO) 
(=9, dots) versus wild-type (WT) ltermates (n—=9 mice). Bars show 
mean. P values are indicated, Mann-Whitney U test g,h, Human 
pulmonary ionocytes are the main source of CFTR in human bronchial 
«epithelium. Human ionocytes detected by fluorescent in situ hybridization 
‘of FOXI and CFTR in bronchi (g; 2 =3 bronchi). SNE of 78217 
droplet scRNA-seq profiles (points) from human bronchial epithelium 
{1 patient), coloured by expression of FOXII (left) and CFTR 
(right). Seale bar, 10m. 


lateral processes (Extended Data Fig. 9d) that may be involved in 
chemosensation or cell-to-cell communication, 

Pulmonary ionocytes specifically express the eystic fibrosis trans. 
‘membrane conductance regulator (Cyr) gene (FDR = 0.00103, initial 
droplet data; FDR = 0.000361; pulse-seq, LRT, Fig. 5a, c, Extended 
Data Figs. 3b, 9c, Supplementary Tables 1-3). lonocytes comprise only 
(0.42% of the mouse cells profiled by scRNA-seq, but express 544% of 
all detected Cftr transcripts. For comparison, the vastly more abun 
dant ciated cells express 1.5% of total Ctr transcripts. Additionally, 

:P (Foxil)* ionocytes were specifically labelled by Cir antibody 
(Fig. 5b). We further confirmed ionocyte-specific enrichment of Cftr 
by quantitative PCR with reverse transcription (qRT-PCR) analysis of 
the mRNA of prospectively isolated populations of primary ionocytes 
and ciliated cells (191.6-fold enrichment) or bulk EpCAM™ epithelial 
cells (158.1-fold enrichment, Fig. 5d, Supplementary Table 12) 

‘We detected ionocytes in mouse submucosal glands, structures asso- 
ciated with cystic fibrosis pathogenesis", and in nasal and olfactory 
epithelia (Extended Data Fig. 9e-g). lonocytes specifically express 
cochlin (Supplementary Table 1), a secreted protein that confers anti 
bacterial activity against the two most prominent pathogens in cystic 
fibrosis lung disease”. Using Foxi! knockout (Foxil(KO)) mouse 
epithelia we show that Fox! is required for expression ofthe ionocyte 
transcription factor Ascl3 (96.3% reduction) and the majority of Cfir 
expression (87.6% reduction, Fig. 5e, Supplementary Table 12), Mouse 
epithelia deficient in Asc3 display moderately reduced Foxit and Cfir 
expression (Extended Data Fig. 10a). 


Ionocytes regulate airway surface physiology 
‘Tight control of airway surface liquid (ASL) and mucus viscosity is 
necessary for effective mucociliary clearance and is disrupted in cystic 
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Fig. 6 | Lineage hierarchy of the airway epithelium, Specific cells are 
associated with novel cell-1ype markers, pathways and diseases, 


fibrosis, We assessed ASL height, mucus viscosity and ciliary beat 
frequency in polarized Foxil(KO) mouse airway epithelia using live 
imaging by micro-optcal coherence tomography (OCT) and particle 

tracking microrheology (Methods). We found increased reflectance 
intensity (Extended Data Fig. 10b) and increased effective viscosity 
of airway mucus (Fig. 5f) in Faxi/(KO) mice, consistent with animal 
models of cystic fbrosis*™*. Ciliary beat frequency also increased in 
the Foxi1(KO) epithelium (Fig. 5f), consistent with a response to an 
elevated mechanical load due to the increased mucus viscosity”. 
‘As with some mouse Cfir(KO) models, neither depth nor pHi 
(Extended Data Fig. 10c, d) of the ASL was significantly altered in 
Foxil(KO) epithelial cultures. 

‘We also tested whether Foxil(KO) epithelia produce abnormal 
forskolin-induced and CFTR inhibitor (CFTR.j-172)-blocked equiva 
lent currents (fq) in Ussing chambers (Methods). Fexil(KO) mouse 
epithelium lacks Cfir (Fig. 5e) yet displayed increases in CFT Rigi 172- 
inhibitable forskolin currents (Extended Data Fig. 10e, ), similar to the 
compensatory currents observed in Cfty-mutant mice 

‘We further investigated the role of Foxil in ferrets, a species that 
models cystic fibrosis well'!. CRISPR-dCas9-VP64-p65-mediated 
transcriptional activation of Foxil (FoxiI(TA)) increased airway 
epithelial expression of Cfir and other ianocyte genes (Extended Data 
Fig. 10g). Foxi1(TA) cultures displayed increased forskolin-induced 
short-circuit currents (A,.) and CFTR inhibitor (GlyHt 101)-induced 
Al, relative to mock-transfected controls (Extended Data Fig. 10h, i). 
‘Therefore, Foxi regulates CFTR expression and function in ferret ar- 
way epithelium, 


Human pulmonary ionocytes are CFTR-rich 
Human pulmonary ionocytes are the major source of CFTR expres 
sion in the airway epithelium, We detected rare FOXII" CFTR® cells 
in human bronchi using RNA fluorescent in stu hybridization (Fig. 5g) 
Additionally, we detected 765 ionocytes by unsupervised clustering of| 
87,285 primary human airway cells analysed by scRNA-seq (unpub- 
lished data). Human ionocytes comprise 0.5-1.5% of epithelial cells 
along the conducting airways (Supplementary Table 10) and specifically 
express FOXII, ASCL3and CFTR (FDR< 10" LRT; Fig. Sh, Extended 
Data Fig, 10), Supplementary Table 11), whereas scattered basal and 
secretory cells express low levels of CFTR. FOXI! transcriptional act 
vation increases ionocyte-specific gene expression in human airway 
epithelial cultures 


Discussion 
Our single-cell atlas of mouse tracheal epithelium identified: (1) anew 
celltype, the ionocyte; (2) new subclasses of dsease-relevant tu and 
goblet cells and (3) novel transitional cells arranged in discrete high 
Tumover structures that we named ‘hillocks (Fig. 6). Our pulse-seq 
analysis further illuminated the differentiation dynamics ofthis new 
hierarchy of cells. 'The analysis reveled a simple model of epithelial 
turnover in which solitary neuroendocrine cells, tuft cells, ionocytes 
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and club cell are all produced at the same rate by basal cells. We spec. 
ulate that the high turnover hillocks represent injury-responsive struc 
tures that couple immunomodulation and barrier function, 

‘The pulmonary ionocyte bears the hallmarks of an ancient prototype 
cell, The ionocyte occurs in animals as distnctas ish, frog and human, 
and is associated with a particular physiologic function: fluid regulation 
at the epithelial interface. We show that Foxil 'Cftr” ionocytes reside 
at multiple levels of the airway tree and that they are responsible for 
the majority of Cir expression. Indeed, we demonstrate that they need 
to function correctly to maintain airway surface physiology, including 
‘mucus viscosity 

Increased forskolin-inducible currents in Foxil(KO) mice are con- 
sistent with the compensatory activation of forskolin-inducible currents 
in large airway epithelia of Cytr-mutant mice. These currents may 
‘moderate the severity of the mouse cystic fibrosis phenotype, and the 
channels responsible could serve as therapeutic targets. 

Since human pulmonary ionocytes express higher levels of CFTR, 
than any other large airway cell type, the current understanding of 
the cellular bass of cystic fibrosis is likely to be incomplete. Previous 
studies have shown that whereas the nasal epithelia of Cfir-null mice 
phenocopy ion transport abnormalities of human cystic fibrosis 
airways", expression of CFTR in ciliated cells does not rescue these 
abnormalities". Taken together with our findings, this suggest that the 
correct cellular context of CFTR expression may be required for proper 
CFTR function, As we show that existing ionocytes are replaced by 
new ionocytes generated from basal stem cells, we speculate that these 
basal cells are the appropriate long-lasting cellular targets for cystic 
fibrosis gene therapy. Studies of single-cell expression patterns in cells 
from human patients with cystic fibrosis will help further address these 
questions. 

In sum, we present a new cellular narrative of airways disease, in 
which specific cell types and subtypes are associated with particular 
disease genes. Because lineage paths and cell states may be substantially 
altered in disease states, comprehensive cell atlases of both healthy and 
diseased human lung are needed as a prelude to reframing the biology 
and pathobiology ofthe lung and its diseases. 
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METHODS 
‘Mouse models. The MGH Subcommittee an Research Animal Care approved. 
animal protocols in accordance with NIH guidlines. Kris-creER! and Segbtal- 
{reER mice have previously been described. Fosil-EGEP mice were purchased 
ftom GENSAT. C57BL6] mice (stock no. 000664),LSL-a/mG mice (mouse stock 
‘no, 007676),and 1SL-1dTomato (stock no 007914), Asc3-EGEP-Cre mice (stock 
‘np, 021794) and Fexil(KO) mice (stock no. 024173) were purchased from the 
Jackson Laboratory Tolabel basal cells and secretory cell fr in viv lineage traces, 
\weadministeredtamovifen by intapertonel injection (3mgper 20g body weight) 
three times every 48h to induce the Cre-mediated excision of «stp codon and 
subsequent expression of tomato. For puls-seq experiments, weadministered 
tamoxifen by intraperitoneal injection (2mgper 20g body weight three times 
very 4 to induce the Cre-mediated excision of stop cadon and subsequent 
expression of EGFP. To label proliferating cells, we administered 5-ethyny1-2" 

deoxyuridine (EU) per 25g mouse by intraperitoneal injection (2 mg per 20 g 
body weight). Six-to-twelve-week-old mice were used for ll experiments. Male 
CCS7BL/6 mice were used forthe fllength and intial seRNAseq experiments 

oth male and female mice were used fr lineage tracing and pulse-seq experi 

rents, We used three mice foreach lineage ime point. 

Littermate mice wer assigned inte groups onthe bass of genotype. 
Immunofluorescence, microscopy and cll counting Trachese were dissected 
and fixed in 4% PFA for 2h at 4°C followed by two washes in PRS, and then embed 
ded in OCT. Cryosections (6m) wee treated for eptope retrieval with 10 mM 
citrate buffer at 95°C for 10-15 min, permeabilized with 0.1% Triton X-100 in 
PS, blocked in 1% BSA for 30 min at room temperature (27°C) incubated with 
primary antibodies for 1h at oom temperature, washed, incubated with appro 
pate secondary antibodies diluted in blocking buffer fr hat room temperature, 
Washed and counterstained with DAPI. 

Inthe case of whole-mount trichea stains, tracheas were longitudinally re- 
sectioned along the posterior membrane, permeabilized with 03% Triton X-100 
in PRS, blocked in 0.3% BSA and 0.3% Triton X-100 for 120min at 37°Con an 
orbital shaker, incubated with primary antibodies for 12h at 37°C (again on an 
‘orbital shaker, washed in 0.3% Teton X-100 in PBS, incubated with appropriate 
secondary antibodies diuted in blocking buffer for Uh at 37°C, washed in 0.3% 
‘Triton X-100 in PBS and counterstained with Hoechst 33342. They were then 
rounted ona aide between two magnets to ensure a at imaging surface. 

‘The following antibodies were used rabbit ant-Atpév0d2 (1/300; paS-41359, 
Thermo), goat anti-CC10 (aka Segblal, 1500; 5C-9772, Santa Cru), anti-mouse 
(CD4S-PE (1/500; #12-0451-83, eBioscience), hamster anti-CDS1(1/500, MAL 
70091, Thermo), rabbit anti-CFTR (1:100; ACL-006, Alomene), mouse anti 
Chromogranin A (1/500; se-393941, Santa Gruz), rat ant-Cochlin (1/500: 
"MABF267, Millipore), anti-mouse EpCAM-PECY7 1/500, 324221, Biolegend), 
{goat ant-FLAP aka AlpxSap, 500; NB300-891, Novus), goat ant-Fosi) (1250, 
3h20454, Abcam), chicken anti-GFP (1500; GFP-1020, Aves Labs), rabbit anti 
Goat (1/30, 5-395, Santa Cruz), rabbit ant-Gogl3 (1:500; 126562, Abcam), 
rabbit anti-Ket13 (1/500; ab92551, Abcam), goat anti-Ket13 (1/500; ab79279, 
Abcam), goat ant-Lipf (1100 MBS121137, mybiosource com), mouse anti-MacSac 
(1500, ma-3223, Thermo), mouse anti-MucSa (1/500; mal-34223, Thermo), 
‘ove ati-p6 (1:25, gt 02425, Gene ex), rabbit nt-TE (1/500; 13681-1-AP, 
Protech), rabitant-Trpms (I:500;ACC.045,Alamone), mouse anti-tubulin, 
acetylated (1-100; 16793, Sigma) All secondary antibodies were Alexa Fluor 
conjugates (488,594 and 647) and used at 1/500 dilution (Life Technologies): dk 
fnt-hicken 488 A-11039, dk ant-goat 488 A-11055, dk antl-mouse 488 A-21202, 
dk ant- rabbit 488 A-21206, dk ant-at 488 A-21208, de anti-goat 594 A-11058, dk 
anti-mouse 394 R37115, dK anti-rabbit 594 R37119, danti-hamster 647 A-21451, 
<kant-goat 647 A-21447, dk an-mouse 647 A-31571, dk anti-rabbit 7 431573. 

EAU was stained in fixed sections alongside the above antibody stains as pre 
viously described 

‘Confocal mages for both slides and whole-mount trachea were obtained with 
an Olympus EV 10 canfcal ase scanning microscope with ax al bjective. 
Cll were manually counted based on immunofluorescence staining of markers 
foreach ofthe respective cll ypes. Cartilage rings (1 to 12) were used as reer 
fence points inal the trachea samples to count speci cell types on the basis of 
immunostaining. Serial sections were stained for the antibodies tested and 
randomly selected slides were used for cell counting. 

Cell dissociation and FACS. Airway epithelial cells from trachea were dissaci- 
sted using papain solution, For whole-trachea sorting, longitudinal halves ofthe 
trachea were cut int five pieces and incubated in papain dissociation sation and 
Incubate a 37°C for 2h, For prosimal-distal el sorting, proximal (arta 1-4) 
and distal (cartilage 9-12) trachea region were disected and dissociated by papain 
Independently. After incubation, dissociated tssues were passed through a call 
strainer and centrifaged and pelleted at 5g for Sin, Call pellets were dispersed 
and incubated with Ovo-mucoid protease inhibitor (Worthington Biochemical, 
«cat.no, LKO03182) to inactivate residual papain activity by incubating ona rocker 
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214°C for 20min. Cells were then pelted and stained with EpCAM-PECY7 (150, 
25.5791-80, Bioscience) and CD45, CDSI, or on thebass of EGFP expression for 
50min in 25% FBS in PRS on ice. After washing. cells were sorted by fluorescence 
(antibody staining and/or EGFP) ona BD FACS Aria (BD Biosciences) using FACS 
Diva software and analysis was performed using Flowfo (version 10) software 

For plate-based scRNA-seq single clls were sorted into each wel of 96-well 
PCR plate containing 5 of TCI bulfer with 1% 2-mercaptoenthanol. In addition, 
‘population control of 200 cells was sorted ino one well and a no-cell control 
‘was forted into another well After sorting, the plate was sealed with a Microseal 
F,centifuged at 80g for | min and immediately frozen on dry ce. Plates were 
stored st -80°C uni jsate cleanup, 

For droplt-hased scRNA seq, cells were sorted into an Eppendorf tue cantain- 
ing 50 lof 0.496 BSA in PBS and stored on ice until proceeding tothe GemCode 
Single Cll Platform. 

Plate-based scRNA-seq. Single cells were processed using a modified SMART- 
seq) protocols previously described. In brit, RNAClean XP bead (Agencourt) 
‘were sed for RNA lysate cleanup, followed by everse transcription using Maxima 
Reverse Transcriptase (Life Technologies), whole transcription amplification 
(WTA) with KAPA HotStat HIFI 2X ReadyMix (Kapa Biosystems) for 2 cyles 
and purification using AMPure XP heads (Agencourt). WTA products were quan- 
titled with Qubt dsDNA HS Assay Kit (Thermo Fisher), visualized with high sen- 
sitivity DNA Analysis Kit (Agilent and bbraries were constructed using Nextera 
XTDNA Libary Preparation Kit (llumina)- Population and no-<ell controls were 
processed withthe same methods as single cell, Libraries were sequenced ana 
{lumina Nexteq 500. 

Droplet based scRNA-seq Single cells were processed through the GemCode 
Single Cell Platform using the GemCode Ge Bead, Chip and Library Kits (V1) or 
Single-cell suspensions were loaded onto 3 library chips forthe Chromium Single 
Cell Library (V2, PN-120233) according the manufacturers recommendations 
(1OX Genomics) bre, single cells were partitioned into Gel Beadsin Emulsion 
inthe GemCode instrument wit cell lysis and barcoded reverse transcription of 
RNA followed by ampliication shearing and 8’ adaptor and sample indexattach- 
‘ent Am input of 6,000 singe ells was added to each channel with a recovery 
rate o oughly 1500 calls. Libraries were sequenced on an lumina Nextseg 500, 
4gRT-PCR. Cells isolated by FACS were sorted into 150 yl TRIzol 1S 
(Thermo Fisher Scientific), whereas ALI culture membranes were submerged in 
300 lof standard TRIzol slution (Thermo Fisher Scientific) A standard chlo- 
roform extraction was performed, flowed by an RNeasy column-based RNA 
purification (Qiagen) acording to the manulaturers instructions. When possible 
1 ug (therwise 100g) RNA vas converted to CDNA using SuperScript VILO kit 
With addtional exDNase treatment according tothe mantacture’s instructions 
(Thermo Fisher Scientific) qRT-PCR was performed using 05 jl of CDNA, pre- 
designed TaqMan probes, and TagMan Fast Advanced Maser Mix (Thermo Fisher 
Scientific) assayed on a LightCycler 480 in 384 well format (Roche). Asays were 
‘un in parallel with the loading controls Hprt and Ube, previously validated to 
remain constant inthe tested assy conditions. Subsequent experiments using fret 
epithelial cells were performed using the same methodology 

Human lung tissues, Human samples were obtained under a protocol approved 
by the Partners Human Research Committee (IRB #2012P001079) and by 
“Massachusetts Institute of Technology Committee On the Use af Humans as 
Experimental Subjects (RB #1603505962A005). 

‘Single-molecule fluorescent in situ hybridization (emFISH). Segments of 
hhuman bronchus were ash frozen by immersion in liguid nitrogen and embedded 
InuOCT and 4M sections were collected, RNAScope Multiplex Fuotescent Kit 
(Advanced Cell Diagnostics) was used per manufacturersrecommendations and 
confocal imaging was caried out as described above 

“Transwell cultures. Cells were cultured and expanded in complete SAGM (small 
surway epithelial cll rowel medium, Lonz, CC-3118) containing TGF-3/BM(P4/ 
WNT antagonist cocktails and 5M ROCK inhibitor ¥27632 (Selecbio, S101) 
“Tointiat ai-liguid interface (ALI) cultures, airway basal stem cells were dssoci- 
ated from mouse tacheas and sed onto transwell membranes, After resching 
confluence, medium was removed from the upper chamber. Mucocilisy di 
ferentiation was performed with PacamaCult-ALT Medium (StemCell, 05001). 
Differentiation of airway haal stem cllson an ALL was fllowed by disc visu 
aliing beating cia in real time alter 10-14 days, 

(Once air-liqud cultures wee fll differentiated, as indicated by beating cia, 
tweatment cultures were supplemented with 25 g/l of recombinant murine I-13 
(eprotech-stack diluted in water and used fresh) diluted in PneumaCult-ALL 
‘Medium, whereas control cultures received an equal yolume of water for 72h 
‘Aer treatment, whole ALI wells were fixed in 4% PEA, immunostained in whole 
‘mount uring the same batfers and imaged with a confocal mcroscnpes described 
above 
‘Airway surface physiologic parameters. Epithelis derived from Foxil(KO) 
mice (wild type, heterzygous knockout and homozygous knockout genotypes) 
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were grown as ALT cultures in ransells as described above and OCT, particle 
‘uacking microrheology airway surface pH measurements, and equivalent cutent 
(a) assays were used to characterize ther physiological parameters as described 
blo 
Micro-optical coherence tomography. wOCT was performed as previously 
described” tn beef, airway surface liquid (ASL) depth and cillary beat 
frequency (CBF) were directly assessed vis cross-sectional images ofthe airway 
epithelium with high resolution (1m) and high acquisition speed (20,480 Hz 
‘Avline rate resuking in 40 frames per sat S12 A-line per frame). Quantitative 
‘analysis of images was performed in Imagel®. To establish CBF, custom code in 
‘Matlab (Mathworks) was used to quantify Fourier analysis ofthe reflectance of 
beating cla. ASL. depth was characterized directly by geometric measurement of 
the respective layers. 
Partile-tracking microrheology. Mucus viscosity was measured following the 
described method” 
Airway surface pH. A small poe was used to measure airway surface pH as 
‘escrbed™ 
[Euivalent curren assy. Equivalent curren assy on mouse ALI was carried out ax 
described with these changes: benzamil was used at 20 iM and CFTR activation 
was done only with 10, forskolin. 
‘Transcriptional activation of Fox in ferret basal cell cultures, For lentivirus 
production and transduction, HEK 2937 cells were cultured in 109 FBS, 1% 
Penicillin streptomycin in DMEM, Cells were seeded at -30% confluency and 
at -90% confluency. Fr each Nask 22g of plasmid 
‘containing the vector pLent-dCae9-VP6t blasticidin or pLent-MS2-p65-HSF1 
hygromycin, 16g of psPAX2, and 7g pMD2 (VSV-G) were transfected using 
‘The day after transfection, culture medium was 
moved and replaced with 2% FBS-DMEM and incubated for 24, Lentivirus 
supernatant was collected 48h afer transfection and centrifuged at 5000 «pm. for 
‘min. Lentivirus was filtered with 20.45 um PVDF filter concentrated by Lent 
X concentrator (Takara), aliquoted and stored at -80°C. Ferret basal cells were 
‘cultured in Paeumacul-Ex with mediam supplemented with Pneumacult-Ex and 
‘supplemented with hydrocortisone and 1% pencilinstrepromycin and passaged 
sta ratio, Cells were incubated with lentivirus for 24 in growth mim. At 
72h, selection wasintated (10) g/ml blasticidin, 50 g/ml hygromyci). Selection 
was performed for 14 days fr Hygromycin and Basticidin with media changes 
very 2th. 

"To generate small guide (g)RNA for transcriptional activation of Fou in ferret 
call, gBlocks were synthesized from IDT and included all components neces- 
sary for sgRNA production, namely:T7 promoter Fax arget-specitic sequence, 
{guide RNA scaffold, MS2 binding loop and termination signal. glocks were PCR 
‘mpliied and gel purified. PCR products were sed asthe template fori vitro 
‘uanscrption using MEGAshortscriptT7 kit (Ambion) AllsgRNAS were puied 
using MegaClear Kit Ambion) and eluted in RNase-free water 

Foxil sgRNA was reverse transfected using Lipafectamine RNAIMAX 

‘Transfection Reagent (Life Science) into ferret basal cells that stably express 
<4Cas9-VP6 fasion protein and MS2-pé5-HSFI fusion protein, For the 0.33-cm? 
“ALTinserts, (Ig) sgRNA and Lipofectamine RNAIMAX were diluted in 50} of 
(Opti-MEM. The solution was gently mixed, dispensed into insert and incubated 
for 20-30 min at room temperature. Next, 300,000 cells were suspended in 
150) Paeumacult-x plus medium and incubated for 24h at 37°C in a5% COs 
Incubator 
Short-circuit current measurements of CFTR mediated chloride transport 
in ferret. Polarized ferret basl cells with activated Foxil expression as well as 
‘matched mock transfection controls (without DNA) were grown in ALL and after 
three weeks short-circuit current (I,) measurements were performed a previ- 
‘ously decribed”. The basolateral chamber was fled with high-chloride HEPES- 
buffered Ringer’ solution (135 mM NaCI, 12 mM CaCl, .2 mM MgCl, 24 mM 
KH,PO, 0.2: mM K.HPO,,5 mM HEPES, pH 74), The apical chamber received 
‘low-chloride HEPES-bulfered Ringer's solution containing a 135-mM sodium 
const substitution for NaCI, was recorded using Acquire & Analyze softwane 
(Physiologic Instruments) aftr clamping the transepithelial voltage to zero. The 
following antagonists and agonists were sequentially added into the apical chamber: 
amiloride (100). to block ENaC channels, pial DIDS (100M) tobloccalcium- 
activated chloride channels, forskolin (1004.1) and IBM (100):M) to activate 
CCETR, and GlyH 101(100 iN) to block CFTR. 
Pre-processing of 3 droplet-based scRNA-seq data. Demultiplexing lign- 
‘ment othe m0 transcriptome and UMI-collapsing were performed using the 
CCllranger toolkit (version 1.0.1, 10X Genomics) For each call we quantified the 
‘numberof genes fr which atleast one read was mapped, and then exclded al 
cells ith fewer than 1,000 detected genes Expression values B for gene in cell 
j were calculated by dividing UMI count values for gene iby dhe sum ofthe UML 
‘Counts in cell j to normalize for difference in coverage and then multiplying by 
10000 to create TPM-like values, and finaly calculating log,(TPM+1) values. 


Selection of variable genes was performed by fiting a generalized linear model 
to therelationship between the squared coefficient of variation (CV) andthe mean 
expression lve in log-log space, and slecting genes that significantly deviated 
(P< 005) from the ited curve, as previously described 

Both pine knowledge and our data show that different cll types have markedly 

lifering abundances inte trachea For example, 3845 ofthe 7.93 cells (3.5%) 
Inthe drople-baced dataset were eventually identified as basal el, whereas only 
26 were ionocytes (042%). This makes conventional batch correction difficult, 
‘5 because of random sampling elects, some batches may have very few (ar even 
2210) ofthe arest cals (Extended Data Fg. 1b) To avid thisproblem and simal- 
taneously identify maximally discriminative genes, we performed an iil round 
‘of clustering on the set of variable genes described above, and untied a set of 
1.380 cel-type-specifie genes (FDR 0.01), with a minimum logs fld-change of 
(025. n addition, we performed batch correction within each identified cluster, 
‘which contained only transcripionally similar cell, ameliorating problemas with 
diferences in abundance. Batch correction was performed (only on these 1.380 
genes) using Comat as implemented in the R package «va using the default 
‘aametric adjustment mode. The output wasa corrected expression matrix, which 
‘vas used ss input to futher analysis 
re-processing of plate-based scRNA-seq data. BAM files were converted to 
‘merged, de-muliplexed FASTQ fies using the lumina Be2Fastq software pack- 
‘ge 02.171. 14,Pated-end rads were mapped tothe UCSC mma10 mouse tran- 
‘riptome using Bowtie” with parameters “q-phred3-quals-n 1 -©99999999 -1 
25-11-X 2000-a-m 15 5-p 6; which allows alignment of sequences with one 
‘mismatch. Expression levels of genes were quantified as ranscript-per-milion 
(TPM) values by RSEN™ v.12. in paired-end mode. For each cell we determined 
the numberof genes foe which at least one read was mapped, and then excluded 
all ells with fewer than 2,000 detected genes We then identified highly variable 
genes as described above, 
Dimensionality reduction by PCA and SNE. We restricted the expression 
‘matrix to the subsets of variable genes and high-quality cells noted above, and 
‘values were centred and scaled before input to PCA, which was implemented using 
the R function preomp from the stats package forthe plate-bared dataset. For 
the droplet- based dataset, we used a randomized approximation to PCA, imple- 
‘mented using the pea function from the sv R package, wit the parameter 
[eset to 100, This low rank approximation i several orders of magnitude faster 
to compute for very wide matrices. After PCA, significant principal components 
‘were dentied using a permutation tetas previously described”, implemented 
‘using the permutationPA function fom the jackstra'R package. Because ofthe 
presence of extremely rarecellsin the drople-based dataset (a described above), 
‘We used scores frm 10 significant principal components using sealed data, and 
7 significant principal components sing unccled data. Only scares from these 
‘significant PCs were used asthe inpat to further analysis 

For visualization purposes nly (and not for clustering), dimensionality vas 

further reduced using the Barnes-Hut approximate version of SNE" This 
‘vas implemented using the ‘Rtsne’ function from the Rtsne R package using 
20,000 iterations anda perplxitysttng af 10 and 75 for plate-anddroplt-based, 
‘respectively. Scares froma the fis principal components were used as the input 
{to -SNE, in which was 1 and 12 for plate- and droplet-based data, respectively, 
{determined using the permutation text described above 
“Excluding immune, mesenchymal cells and suspected doublets, Although cll 
‘were sorted using EpCAM before scRNA-seq, 1,73 contaminating cells were 
‘Sbserved inthe inital droplet dataset, and were comprised of 1 endothelial cells 
expressing Fgf7, shl3 and Esa 229 macrophages expressing MHC (H2-Ab1, 
1H2-Aa, C74), Clqa, and C6 and 153 fibroblasts expressing high levels af col 
Jagens (Coal, ColJa2and Call). Each ofthese cell population was identified 
by an intial round of unsupervised clustering (densty-based clustering ofthe 
SNE map using Whscan from the R package pe) as they formed extremely 
Aistinet clusters, and then removed, Inthe case ofthe pulse-seq dataset. the initial 
‘usterig step removed a total of $32 dendritic cellsdentited by high expression 
of Piprand C83. n addition, 20 other cells were outliers in terms of library 
complexity: which could possibly correspond to more than one individual ell per 
Sequencing library, or Soubles! As conservative precaution, we removed these 
20 posible doublet cells with over 3700 genes detected pe cll. 
‘knearest neighbour graph based clustering. To later single calls by theirexpres 
sion profiles, we used unsupervised clustering, based onthe Infomsp community- 
detection algorithm®, fllowing approaches recently described for single-cell 
(CYTOF data" and seRNA-seqh= We constructed a knearest-neighbour graph 
‘using fr each pai of calls, che Euclidean distance betwen the scares of signifi 
‘ant principal components as the metric, 

"The number of nearest neighbours was chosen in a manner roughly conse 
ent with the sie ofthe dataset, nd set to 25 and 150 fr plat- and dropet-hased 
‘ata, respectivel. For sub- clustering of rare cll subsets, we used k= 100,50, 50 
and 20 or tft ces, neuroendocrine ells ionocytes and goblet calls respectively 


“The kneatest neighbour graph was computed using the function ang’ from the 
package ceed and was then used asthe input to Infomap, implemented using 
the infomap community function from the igrapl R package. 

Detected clusters were mapped to cell types using known markers for 
tracheal epithelial subsets. In particular, because ofthe lage proportion of basal 
and club cell, multiple clusters expressed high levels of matkers for these two 
Iypes. Accordingly. we merged nine clusters expressing the basal gene score above 
2 mredianlog,(1TPM-+1) > 0, and seven clusters expressing the club gene score 
hove median log,(TPM+1) > 1. Calculation of ciated cell gene score showed 
only single cluster with non-zero median expression, sono further merging 
was performed. This resulted in seven clusters, ach corresponding I-to-1 witha 
‘known airway epthalil cel ype, withthe exception of the ionocyte cluster, which 
weshow represents a novel subet 

Rare cells (tuft. neuroendocrine, ionocyte and goblet) were sub-clustered to 

examine possible heterogeneity of mature types (Fig. 4, Extended Data Fig 8). 
Ineach case cells notated as each type from the itil 3” dropet-based dats 
‘sot Fig tb, Extended Data Fig. 1d were combined with the corresponding cells 
from the puse-seq dataset (Fi, 3h, Extended Data Fig, 6) before sub-lustering 
Inthe cas of goblet cll, sub-clustering the combined 468 goblet cells (k=20, 
above) partitioned the data into7 groups, two of which expressed the novel goblet 
cell marker Gp2 (Fig 1) at high levels (median loga(TPM+1) > 1). These two 
groups were annotated as mature goblet-1 and gobet-2 cells (Extended Data 
Fig Aj), while the five groups were merged and annotated as immature goblet 
cells. No cluster merging was performed fr sub-clustering of tuft, neuroendo- 
rine or onocytes. 
Differential expression and cell-type signatures. To identify maximally specific 
genes for cll-types, we performed differential expression tess between each pair 
ff laters forall possible pairwise comparison larger custers—basal club, 
ated ces were down-sampled to 1,00 calls). Then, for a given cluster, putative 
“signature genes were filtered using the maximum FDR Qvale and rankedby the 
‘minimum log fold-change (across the comparisons) Thisisa stringent criterion 
because the minimum fold-change and maximums Q value represent the weakest 
tflect size across pairwise comparisons, Cell-¢ype signature gene forthe initial 
droplet hased scRNA-seq data (Fig. Le, Supplementary Tables 1) were obtained 
‘sing a maximum FDR of 05 ands minimum log fold-change of 05. 

‘Where fewer cells were available, as isthe case for full-length plate-based 
“scRNA-seq data (Extended Data Fig 3b, Supplementary Table 2) o for subtypes 
within cll-types (Fig 3 Extended Data Fig 8), a combined P value across the 
Parise tests for enrichment was computed using Fisher method (a more lenient 
ériterion) and a maximum FDR Qvalue of 0.001 was used, along witha cut-off of 
‘minimum log: fld-change of 0.1 for tuft and goblet cell subsets (Fig. 3c, Extended 
Data Fig, Supplementary Table 8) Marker genes were ranked by mimum lo 
fold-change.Diferental expression tests were carved using a two part hurdle 
‘model to contro for both technical quality and mouse-to-mouse variation. This 
vas implemented using the R package MAST, and values fr dileentil expres 
son were computed sing the ikeihood:-ratio text. Mulkiple hypothesis testing 
correction was performed by contralling the false discovery rate” using the R 
function padiust 
Scoring cells using signature gene sets To obtain a score fora specific set of 
genes ina given cell background gene set was defined to contol for dlfer 
fencesin sequencing coverage and library complexity The background gene set 
‘was selected fr sinlarity to the genes of interest in terms of expression level 
Specifically the 10» nearest neighbours inthe 2D space defined by mean expres 
son and detection frequency acrseall cells were selected. The signature score for 
that cll was then defined asthe mean expression ofthe n signatre genes in that 
cell nus the mcan expression ofthe 10n background genes in that cll. 
“Assigning cell-type specific transcription factors, G-protein-coupled receptors 
and genes associated with asthma, list of all genes annotated as transcrip 
tion factors in mice was obtained from Animal TEDB®, downloaded from hip 
\worbioguocorg/ Animal FDB/BrowseAITEphpspe=Mus_mausculs. The set of 
G protein-coupled receptors (GPCRs) was obtained from the UniProt database, 
loveloaded fom bp /woewsuniprot.org/uniprt/2query=family 3 
tein coupled eceptnr'22+AND-+arganism’s3A%22Mouse + KSB10090%5D 
922+, AND-teviewed’3AyesSsort~score. To map from human to mouse gene 
‘ames, human and mouse orthologues were downloaded from Ensembl latest 
release 86at tp://wew.ensembLory/biomart/martview, and human and mouse 
gene synonyms ftom NCBI fipfipnebinlm nih gov/gene/DATA/GENE_INFOY 
Mammalia) 

(Cel-type enriched transcription factors and GPCRs were then identified 
by intersecting thelist of genes enriched into each cll type withthe ists of tra 
scription factors and GPCRs defined above. Cel-ype enriched transcription 
factors (Fig. le) and GPCRS (Extended Data Fig. 8a) were defined using the 3" 
Aroplet-hased and fal-lngth plate-based datasets, respectively, as those with 
{minimum logs fold-change of 0.1 and a maximum FDR of 0.001, retaining 
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complet lists are provided in 


4 maximum of 10 genes per cell type in Fig 
Supplementary Table. 

Gene st or pathway enrichment analysis. Gene ontology (GO) analysis of 
enriched pathways in KrtL3" hillock (Extended Data Fig. 3) was performed 
using the yoseq’ K package, using sgnticanly dtferentally expressed genes 
(FDR-<005) as target genes, and all genes expressed with log,(TPM+1) > 3in 
atleast 10 cells asbackground. For pathway and gene ses, we used aversion of 
[MSigD with mouse orthologues, downloaded from hip /bcinweht ed au! 
softvare/MSigDBi.Asiocatin of principal components with ll-types (Extended 
Data Fig. 72, b) was computed using the gene-set enrichment analysis (GSEA) 
algorithms implemented using the ges package in R®, Genes that are involved in 
leukotriene bioeyntheris and tate transduction pathways (Fig 4c) were dentitied 
using KEGG (Kyoto Encyclopedia of Genes and Genomes) and GO pathways. 
Specifically genesin KEGG pathway 00590 (arachidonic acid metabelism) or GO 
terms 0019370 (leakotriene biosynthetic process) or 0061737 (leukotriene signal- 
ling pathway) were annotated as leuketriene-syathess associated, whereas genes in 
KEGG pathway 04742 (taste transduction) were annotated as aste-ransduction- 
associed, To identify statistical enrichment of these taste and leukotriene 
pathways in tuft-1 and tft-2 subtypes. respectively, the hypergeometric probability 
afthe overlap between the mark genes foreach subst (Supplementary Tale 8) 
and the gene sts was directly calculated using the R function fishectest 
‘Statistical analysis of proximodistal mucous metaplasia, For the analysis 
in Fig. 2h, the extent f goblet cell hyperplasia was assessed using counts of 
‘MucSic* goblet als, normalized to counts of EGFP* ciliated cells. To quantify 
Aliferences inthe count valves betwen the samples in diferent conditions 
Fasjl-EGEP mice), we ita negative binomial regression using the ‘m.n func: 
tion from the "MASS package in R. Pairwise comparisons between means for 
each condition were computed using posthoc tess and Pals were adjusted for 
‘multiple comparisons using Tukey's HSD, implemented using the function pas 
fromthe means package in R. 

Lineage inference using diffusion maps. We restrict our analysis tothe 848 
cellsin basal, club or ciated cell dusters (95.2% ofthe 7,193 cells in the initial 
droplet dataset), because it was unlikely that ar cells for example, neuroendo- 
‘rin, tuft, goblet and ionocyte call) in transitional states would be sufficiently 
densely sampled. Nest, we selected highly variable genes among these three cell 
subsets as described above, and performed dimensionality ediction using the 
ditfsion map approach. In bret, cell-cell transition matrix was computed using 
the Gaussian kernel in which the kernel width was adjusted to the local acigh= 
houthood ofeach cel allowing the previously described approach”. This matrix 
‘was converted to a Markovian matrix afer normalization The right eigenvectors 
4i=0.1.23,..)ofthismatrix were computed and sorted in the onde of decreasing 
cigenvalues \-— 0,123...) aftr exclading the top eigenvector ,coresponding 
to-\)= (which reflects the normalization constraint of the Markovian mati). 
‘The emaning eigenvectors ) define the dfusion map embedding and 
are referred to as difusion components (DC,(k = 12). We noticed aspectal 
‘gapberween Asand As and hence retained DC; ~ DCs fr further analysis. 

"Toextract the edges of this manifold, along which ells transition between states 
(ig. 23), we ita convex hall using the convhull from the eometry Rpackage 
“To identify edge-sssociate cells any cell within d <0.1 of an edge of the convex 
hull (in which dis the Euclidean distance in diffusion space) is assigned to that 
edge. 

“To identity cll associated with the Krt4*/Krt13 population, we wsed uns- 
pervised partitioning around medoids (PAM) clustering of the cll in diffusion 
space with the parameter kd, Edge-association of genes (or ranscrption factors, 
Supplementary Table 7) was computed asthe autocorrelation (lag=25),imple- 
‘mented using the sc” function fom the state R package. Empirical P values for 
cach edge-associated gene were assessed using a permutation ts (1,000 bootstrap 
iterations), using the autocorrelation vale asthe test statistic 

‘Genes were placed in psewdotemporal order by splitting the interval into 30 
bins from ‘erty’ to late and assigning each gene the bin with the highest mean 
‘expression, Thete data were smoothed using loess regression and then visualized 
as heat maps (Extended Data Fig 5). 

Pulse-seq data analysis. For the much larger pulse-seq dataset (66,265 cells) 
we used avery similar, but more scalable, analysis pipeline, wit the following 
‘modifications. Alignment and UML collapsing Was performing using the 
CCllranger toolkit (version 1.3.1, 10X Genomics). Log:(TPM+1) expression 
‘ales were computed using Repp-based function in the R package'Seura (V2.2). 
\Welso used an iproved method of entifying arable genes Rather than Sting 
the mean-CY" relationship. a logistic egression was fi to the cellular detection 
fraction (often referred to as 1), using the total aumber of UMIs per cll asa 
predictor Outliers from this curve are genes that are expressed in alower faction 
ff cells than would be expected given the total number of UIs mapping to that 
[gene that i, cell-type or state-specific genes. We used a threshold of deviance 
“0135, producing set of 708 variable genes, We restricted the expression matsix 
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tothis subset of variable genes and values were centred nd scaled while 'regres- 
Ing out” technical factors (number of genes detected per cell, number of Us 
detected pr cell and cel yc score) using the ScaleData function before input 
{oPCA, implemented using RunPCA in Seurat. After PCA, significant principal 
components were identified using the knee inthe sree plot, which identified 10 
significant principal components, Only scores fom these significant PCs were 
used asthe input to nearest- neighbour based clustering and SNE, implemented 
using the FindClasters (resolution parameter r=1) and RunTSNE (perplexity 
25) methods, respectively; fom the Seurat package 

‘Gace again ewig o their abundance, the populous has, club and cited cll 
were spread acoss several listers, which were merged using the strategy described 
above: 19 clusters expressing the baal score above mean log(TPM+1) >0, 12 
‘expresing the ub core above mean lg(TPM+-1) >~0.,and2custersexpres- 
{ing the ciliate signature above were merge to construct the basal club and cil 
lated subsets, respectively Goblet cells were not immediatly associated with 2 
specific cluster, however, cluster 13 (one of those merged into the club cluster) 
‘expressed significantly elevated levee of goblet matkers 12 and Gp2(P<10-", 
[LRT}-Sab-lustering this population (resolution parameter =1) revealed clus” 
ters of hich two expressed the goblet score constructed using the top 25 goblet 
cell marker genes (Supplementary Table I) above mean log,(TPM+1) >1, which 
‘were merged and annotated as goblet cll. To identify the Kred*/Krt13°hillock- 
associated lub cells, the remaining 17,700 clu cells were e-lustered (eeslution 
parameter r=02) into 5 clusters, of which one expressed much higher levels 
(P<10-Minall eases) of Kres,Kr3 and hillock score constructed using the 
top 25 hillock marker genes (Supplementary Table 6). this luster was annotated 
as hillock-ascocited club cells 

Estimating lineage labelled fraction for pulse-seq and conventional lineage 
tracing. For any given sample (here, mouse) the certainty in the estimate ofthe 
proportion of abelled ells increases with the numberof cell obtained: the more 
‘ells the higher the precision ofthe estimate. Estimating the aveal fraction of 
Labeled calls from conventional lineage racing Fig. 3, Extended Data Figs 4,6: 
orpulse-seq lineage tracing, Fig.3, Extended Data Fig 6) onthe bass ofthe ind 
‘vidual estimates from each mouse ie analogous to performing 3 meta-analysis of 
‘eveal studies, each of which measures a proportion of the population; studies 
with greater power (higher n) carry more information, and should influence the 
overall estimate more, wheres low-n studies provide less information and should 
‘athave as much influence. Generalized linear mixed models provide framework 
to obtain an overall estimate in this manner. Accordingly, we implemented a 
fixed effects logistic regression model to compute the overall estimate and 95% 


“Testing for difference in labelled fraction for puls-req and conventional lineage 
tracing To assess the significance of changes in the labelled fraction of cells in 
Aliferent conditions we used «negative binomial regression model ofthe counts of 
calls at each ime- point, controlling fr variability amongst biological (mouse) ep- 
licates. For each cell ype, we model the numberof lineage-labelled cells detected 
{in each analysed mouse a a random count variable using a negative binomial 
distribution. The frequency of detection is modelled by using the natural log of 
the total aumber of cells of that fype profiled ina given mouse as an offset. The 
time pint ofeach mouse (0,30 or 60 days post tamoxifen) is provided asa covar- 
fat. The negative binomial model wast using the R command ym. from the 
"MASS package. The P value for the significance ofthe change i labelled fraction 
sine between time-points was assessed using likelihood. rat ext, computing 
‘sing the R function anova 

Estimating turnover ate using quantile regression, vente relatively fw samples 
(=9 mice) with which to model the rat of new lineage-labelled cells, we used 
the more robust quantile regression”, which models the conditional median 
(Cather than the conditional mean, as captured by least-squares linear regression, 
hich can be sensitive to outliers) The faction alablledcllsin each mouse was 
‘modelled as a function of days post tamoxifen (Extended Data Fig, 6b) using the 
fanction rg from the R package quantReg. Significance of association between 
{increasing labelled fraction and time were computing using Wald tests imple- 
:mented with the ‘summary. function, while tests comparing the slopes ofits 
‘were conducted using ‘anova tg 

Statistic. Blinding vas used for data analysis including the genotype of mouse 
‘samples for qRT-PCR expression studies, electrophysology studies and charac- 
terization of physiologic parameters atthe epithelial surface (pH, ASL, mucus, 
CBE viscosity). 

‘Statistical hypothesis testing. Wit the exception ofthe LRT, which s one-tailed all 
test sed were two-tailed, and exact P values re reported, except where they are 
below the threshold of numerical precision (2.22 x 10") 

Statistical analysts af RT-PCR data. AAC values were generated by normalization 
tothe average of loading controls Hprt and Ub fllowed by comparison to wild- 
type samples. Statistical analysis was performed atthe AC; stage, For single com 
parisons all datasets passed the Shapiro-Wilk normality test, which wasfollowed 


bya posthoc two-talled -test, For multiple comparisons all datasets passed the 
Shaplto-Wilk normality test fr equal variance. Data were then tested by two-way 
‘ANOVA, with sexasthe second level of variance. Inafew specific cases, extended 
towards significance, however. not sufficiently oust separate analysis Post hoc 
‘multiple cornparisons to the contol geoup were performed using the Holm-Sidak 
‘method. In the single case of Fxil KO (Fig Se), two heterozygous samples were 
‘denied as outliers and removed using a standard implementation of DBscan 
clustering using the fll dataset ofall genes assayed using qRT-PCR. These twa 
‘amples exhibited gene expression closer to ull Fol knockouts and were removed 
fom consideration, Ina cases, errr bars represen the calculated 95% Cl. 
Reporting summary, Further information on experimental design is avalablein 
the Nature Research Reporting Summary linked to this pape. 

‘Code avallabiity.Rmatilown scripts enabling themain steps ofthe analysis tobe 
performed are avallbl from hups!/githubscom/adamb-broadsingle cell airway. 
Data availability. All data have been deposited in Gene Expression Omnibus 
under acession code GSE103354 and in the Single Cell Portal (bttps/portals. 
broadinstituteorg/single_cell/study/airway-epthelium), and Source Data for 
Figs 1-5 provided with the paper. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Identifying tracheal epithelial cll types in 3” 
SeRNA-seq. a, Quality metres forthe initial droplt-based 3! seRNA- 

‘eq data. Distributions of the numberof reads per cell (Let), the number 
ofthe genes detected with non-zero transcript counts per cell (centre), 

and the fraction of reads mapping to the mmi10 transcriptome per cell 
(ight). Dashed line, median; blue line, Kernel density estimate. b, Cell type 
clusters are composed of cells from multiple biological replicates, Fraction 
tf cells in each cluster that originate from a given biological replicate 
(r=6 mice) Post hc annotation and number of cells are indicated 

above each pie chart. All biological replicates contribute to all clusters 
(excep for wild-type mouse 1, which did not contain any ofthe very rare 
lonocytes (0.39% ofall epithelial cells), and no significant batch eflect was 
observed. c, Reproducibility hetvreen biological replicates. Average gene 
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expression values (log.(TPM+1)) across all cells of two representative 3° 
ScRNA-seq replicate experiments (r= Pearson correlation coeficen!). 
‘Blue shading, gene (point) density. Post hoc cluster interpretation based 
fon the expression of known cll type markers. SNE of 7.193 scRNA-seq 
profiles (points), coloured by cluster assignment (top lft), by expression 
(ogs(TPM+1)) of single marker genes, or by mean expression of several 
‘marker genes fora particular cell type, Cell type clusters. Pearson 
correlation coefficients (r, colour bar) between every pai of 7193 cells 
(ows and columns) ordered by clistr assignment. Inset (right), 200m of 
288 cells from the rare types. f, Gene signatures. Relative expression level 
(row-wise Z score oflog,(TPM-+1) expression values) of ell-type-specific 
genes (rows) in each epithelial cell (columns). Large clusters (basal, lub) 
fare down-sampled to 500 cell, 
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Extended Data Fig. 2 | Mdentifying tracheal epithelial cell types in full- 
length scRNA-seq a, Quality metrics for full-length, plate-based seRNA- 
seq data. Distributions ofthe number of reads per cell (lef), the number 
ofthe genes detected with non-zero transcript counts per cell (centre), 
tnd the fraction of reads mapping tothe mm10 transcriptome per cell 
(ight) b,c, High reproducibility between plte-based scRNA-seq data 
fom biological replicates of tracheal epithelial ces, Average expression 
values (log:(1PMC+1)) in two representative full-length scRNA-seq 
replicate experiments (eft and inthe average of a fll-length scRNA-seq, 
dataset (right) and a population control right) fr cells extracted from 


1SNE1 
prosimal (b) and distal (¢) mouse trachea. Blue shading: density of genes 
(points) r= Pearson correlation coefficient. d, Post hoc cluster annotation 
by the expression of known cell-type markers. -SNE of 301 scRNA-3eq 
profiles (points) coloured by region of erigin (op left), cluster assignment 
(Gop, second from lft), o, inthe remaining plots the expression level, 
(log:(TPM +1) of single marker genes or the mean expression of several 
marker genes for a particular cel type All clusters are populated by cells 
from both proximal and distal epithelium except rare neuroendocrine 
calls, which were only detected in proximal experiments (top lf. 
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Extended Data Fig. 3| See next 
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Extended Data Fig. 3 | High-confidence consensus cll type markers, 
and cell-type-specific expression of asthma-asrociated genes. a, Cell 
type clusters inful-lengthplate-based scRNA-seq dat, Cell-cell Pearson 
correlation coeficient (7) between all 301 cells (individual rows and 
columns) ordered by cluster assignment (as in Extended Data Fig. 24). 
‘Right, magnified view of 17 cell (black border on left) from the rare 
types b, High confidence consensus markers. Relative expression level 
(ow-wise Z score of mean log,(TPM +1) of consensus marker genes 
(ors, EDR <0.01 in both ¥-droplet and fall-lengthplate-based seRNA- 
seq datasets: LRT) for each cell type flanking colour bar) across 7193, 
cells in the 3° droplet data (columns eft) and the 301 ces in the plate- 
based dataset (columns right) Top 15 markers shown, complete sets 

are in Extended Data Fig. If, Supplementary Table 3.¢, luster specific 
transcription factors in 3 seRNA-seq data, Mean relative expression 
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(row-wiseZ score of mean log,(TPM-+1), colour bar) ofthe top 
transcription factors (rows) that are enriched (EDR <0.01, LRT, two-sided) 
in cells (columns) ofeach cluster. df, Cel-type-specific expression of 
jgenes associated with asthma by GWAS. d, Relative expression (2 score of 
smcan log(TPM-+1)) of genes that are associated with asthma in GWAS 
and enriched (EDR <0.01, LRT) for ell-type-specific expression in our 

3’ seRNA-seq data, The significance (~logia(FDR), Fishers combined 
value, LRT) and effect size (point size, mean log(fold-change)) of cell 
type-specific expression and its genetic association strength from GWAS! 
for each gene from df, Distribution of expression levels (log3(1PM-+1)) 
inthe cells in each cluster (x axs, colour legend) for two asthma GWAS 
genes: Clhr3 (top specific to ciliated cells) and gs13 (bottom: specific to 
tuft cells) EDRs, LRT 
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Extended Data Fig. 4| See next page for caption 
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Extended Data Fig. 4 | Krt13" progenitors express a unique set of 
‘markers distinct from mature lub cells. , Proximal versus distal 
specific club cell expression. Relative expression level (row-wise Z score; 
colour bar) for genes (rows) enriched in proximal and distal tracheal 
club cells (FDR 0.05, LRT) inthe fall-length scRNA-seq data b, Distal 
cpithelis differentiate into mucous metaplasia. Goblet cell quantification 
(da(M@ucSac /EGEP’ ciliated cells) in Faxj1-EGEP mice (n = 6, dots) in 
cach of four conditions in (Fig. 22) P values, Tukey's HSD test; black bars, 
‘mean; error bars, 95% Cl. c, Krt8 does not distinguish pseudostratified 
clu cell development from hillock-assaciated cluh cell development, 
Diffusion map embedding of 6905 cells (asin Fig. 2b) coloured either by 
their Kr13" hillock membership (left, green), or by expression 
(og,(TPM-+1)) of specific genes (all other panels) d, Immunostaining 
ofhillock strata. Left: Krt3" (green) and"Trp63 (magenta) basal (solid 
outline) and suprabasal (dashed outline) cells. Right: Krt13" (green) 
and Scgblal' (magenta, solid outline) luminal cells, Representative 
Immunostaining from 3 mice.e,f, Krtl3" hillock cells are highly 
proliferative. e, Co-stan of EAU (magenta) and Ketl3 (green), 
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‘mice. Fraction of EAU epithelial cells in hillock. 
|48-10.5%]) and non-hillock (mean, 2.4%, 95% Cl 
[1.8-3.1%)) areas. P values: LRT, n=4 mice; black bar, mean; error bars, 
959% Cl. g Fraction of Krt13" hillock cells that are club cell lineage labelled 
(94) decreases from day 5 (10.2%, 95% Cl [0.07, 0.16} to day 80 (5.2%, 
{95% Cl 0.03, 0.08). Error bars, 95% Cl; n =3 mice (dots); P values, LRT. 
1, Diferental expression (log3(fld-change)) and associated significance 
(logie(FDR)) foreach gene (dot) that is dilferentialy expressed in Krt13 
cells (identified using clustering in diffusion map space) compared to all 
cells (FDR <0.05, LR). Colour code, cell type with highest expression 
(For example, green shows genes that are most highly expressed in Krt13" 
hillock cells) Dots show al the genes differentially expressed (FDR <0.05) 
between Krt13" hillock cells and other cells. Genes with log: fold- 

change >1 are marked with large points, whereas others are identified 
assmall points (grey) i, Enriched pathways in Krt13" hillock cell. 
Representative MSigDB gene sets (rows) that are significantly enriched 
(colour bar, —logia(FDR), hypergeometric test) in Krt13" hillock cells. 
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Extended Data Fig. | Genes associated with cell fate transitions. 
Relative mean expression (Iness-smoothed row wise Z score of mean 
((TPM-+-1)) of significantly (P< 0.001, permutation test) varying genes 
1) and transcription factors (e-h) across subsets of 6,905 (columns) 
basal club and ciliated cells, Cells are pseudotemporally ordered (x axis, ll 


plots) using diffusion maps (Fig. 2b, 
‘assigned toa cell fate transition i twas within d <0.1 of an edge ofthe 
Convex hull of all points (in which ds the Euclidean distance in diffusion 
space) assigned to that edge 
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Extended Data Fig. 6 | Lineage tracing using pulse-seq. a, Schematic 
ofthe pulse-seq experimental design b, Post hoc cluster annotation by 
kknovn cell type markers. SNE of 66,265 seRNA-seq profiles (points) 
from pulse-seq, coloured by the expression (log:(TPM-+1)) of single 
marker genes fora particular cell type or cell-cycle score (bottom right) 
«, Pulse-sq lineage-labelled fraction of various cll populations over 
time. Linear quantile regression fits (trendline) tothe fraction of lineage: 
labelled cells ofeach type (n= 3 mice per time point, dots) asa function 
ofthe number of days afte tamoxifen-induced labelling, estimated 
regression coefficient, interpreted as daily rate of nev lineage-labelled 
cells p,Pvalue forthe significance ofthe relationship, Wald test. As 

expected, goblet and ciliated cells ae labelled more stvy than club cells 
(Pig. 3d) d, Labelled fraction of basal cells is unchanged during pulse-seq, 
time course, as expected. Estimated fraction () of cells of each type that 
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are postive for the fluorescent lineage label (by FACS) in each of 
‘mice (points) per time point. P values, LRT; error bars, 95% Cl. 
«Proportion of basal cell lineage labeled tuft cells at day 0 (0% n =2 mice, 
dots) and day 30 (22.9%, 95% Cl [0.17, 0.30]; bars, estimated proportions, 
1=3 mice). Error bars, 95% Cl: P values, LRT. f-h, Conventional Segblal 
(CC10) lineage trac of rae epithelial types shows minimal contsbution 
to rae cell lineages. Fraction of ScbLal labelled (club cell trace) cells, 

(96) of Gat tuft cell (fat day 0 (n= 3 mice; 0.6%, 95% CL (0.00, 
(0.04) and day 30 (n=2 mice, 6.3%, 95% Cl [0.04,0.11)), EGEPUFaxi1) 
ionocytes a day 30 (n=2 mice; 29%, 95% Cl [0.01,0.11])(g). and Chga 
neuroendocrine cells at day 0 (n= 2 mice; 25%, 95% Ct [0.01 0.08]) and 
day 30 (n=2 mice: 2.6%, 95% Cl [0.01 0.08) (h) after club cell lineage 
labelling # values, LRT; error bars, 95% Cl 
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Extended Data Fig. 7 | Club cell heterogeneity and lineage tracing 
hillock-associated club cells using pulse-seq, a,b, Principal components 
are associated with basal to club dilferentiation (PC-1), proximodistal 
heterogeneity (PC-2), and hillock gene modules (PC-2).a, BC-1 (xaxis) 
versus PC-2 (yaxis) fora PCA of 17,700 scRNA-seq proiles of club cells 
(points) i the pulse-seq dataset, coloured by signature scores for basal 
(left), proximal cub cells (centre left), distal club cell (centre right), the 
Xru13"/Krtd! hilloc (right) or their cluster assignment (inset, right). 

b, Bar plots show the extent (normalized enrichment score) and 
significance of association of PC-1 (left) and PC-2 (right) for gene sets 
associated with different airway epithelial types (x axis), or gene modules 
associated with proximodistal heterogeneity (Extended Data Fig. 4a). Heat 
:maps show the relative expression level (row- wise Z score oflogs(TPM-+1) 
expression values, colour bar) ofthe 20 genes with the highest and lowest 
Toadings on PC-1 (left) and PC-2 (right) in each club cel (columns, dawn 
sampled to 1,000 cell for visualization only). P values, permutation test 
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« Pulse-seq lineage tracing of hillock-associated cells, Estimated fraction 
(Gi) of cells ofeach type that are positive fr the luorescent lineage label 
(by BACS) from x=3 mice (points) per time point. P values, LRT. Error 
bars, 95% Cl.d,Hillock- associated club cells are produced at a higher 

rate than al club cells. Estimated rate (%) based on the slope of quantile 
‘regression fits to the fraction af lineage-abelled cells ofeach type. P values, 
rank test error bars, 95% Cl, f, Club cells initially labelled by pulse-seq 
are associated with basal to cu cel differentiation. e, Distribution of 
basal signature scores for individual club cells (points) from each pulse- 
seq time point and lineage label status. P value, Mann-Whitney U test. 
Violin plots shove the Gaussian kernel probability densities ofthe data, 
large white point shows the mean. f, PC-1 versus PC-2 fara PCA of 17,700 
scRNA-seq profiles of cub cells (points), as in a, highlighting club cells, 
that ae lineage labelled atthe initial time point (legend). g, Schematic 
ofthe more rapid turnover of basal to club cells inside (op) and outside 
(bottom) hillocks 
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Extended Data Fig. 8 | Heterogeneity of rare tracheal epithelial cell, 
types. a, Celltype-enriched GPCRs. Relative expression (Z score of 
mean log,(TPM+1)) ofthe GPCRS that are most enriched (FDR <0.001 
[LRT inthe cell ofeach tracheal epithelial cell type based on full-length 
seRNA-soq data b, Tf cell-specific expression of type land type Il taste 
receptors. Expression level (mean log:(TPM+1)) of tuf-cell enriched 
(FDR <0.05, LRT) taste receptor genes in each tracheal epithelial cell ype 
based on full-length scRNA-seq data c, Tut cell-specific expression of the 
type-2 immunity-ascociated alarmins [125 and Ikip. Expression level, of| 
125 left) and sp (eight) in each cell type. FDR, LRT. Violin plots show 
the Gaussian kernel probability densities ofthe data. d, Morphological 
features of tuft cells. Immunofluorescence staining of the tult-cell marker 
Gaat3 (yellow) along with DAPI (blue). Arrowhead, tat artows, 
cytoplasmic extension. e,f, Tuft-1 and tuf-2 sub-clusterse, SNE 
‘visualization of 892 tut cells (points) coloured either by thir cluster 
assignment (left, colour legend), or by the expression level of marker genes 
for mature tft cells (Trpm),tuft-1 (Gng!3), tuf-2 (AloxSap) subsets. 

£, Distribution of expression levels ofthe top markers far each subset. 
Violin plots show the Gaussian kernel probability densities ofthe data, 


12208 Springer Nature Limited. lrg 


large white point shows the mean. FDR, LRT, n=15 mice. Tuft-1 and 
tuft-2 subtypes are each generated from basal cell parents, Estimated 
fraction of cells ofeach type that are positive forthe basal-cell lineage 
label (by FACS) from 5 mice (points) per time point in the pulse- 

seq experiment. P values, LRT; error bars, 95% Cl h, Differential 
‘expression of tu cell-associated transcription factors between tut cell 
subtypes, Labelled genes are differently expressed in the tft cell subsets 
(EDR <0.01, LRT). Mature and immature subsets are identified using 
marker gene expression, The distribution of expression of score (using 
top 20 marker genes, Supplementary Table 1, Methods) for tuft (i), goblet, 
(j).basal and club cells (label on top) in each cell subset (basal and club 
cells down-sarpled to 1,000 cells) P values, Mann-Whitney U test. 

Ie, Gene signatures for goblet-1 and goblet-2 subsets. The distribution 
(and relative expression level (I) of marker genes that distinguish log: 
fold-change >0.1, FDR <0.001, LRT) cells inthe goblet-1 and goblet-2 
sub-clusters (colour bar, top and left from the combined 3’ seRNA-seq 
datasets. m, Immunofluorescence staining ofthe goblet-1 marker TILL 
(magenta), the known goblet cell marker MucSac (green) and DAPI (blue). 
Solid white lin: boundary ofa goblet-1 cel, 
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Extended Data Fig. 9 | lonocyte characterization in situ, 
a, Immunofluorescence characterization ofionocytes. amacytes visualized 
in Foxi1-EGEP mouse. EGEP( oxi) appropriately marks Foxil antibody 
Positive cells (lft, solid outline), EGFP(Foxi!) cells express canonical 
‘rway markers kf] (Nkx2-1) and Sox2 (solid outlines). EGEP( oxi) 
Cells do not label with basal (Irp63), club (Sogblal) ciliated (Fox), tuft 
sadocrine (Chga) or goblet (1H2) cel markers (dashed 

ed in the surface epithelium, 


outline), lonocytes are sparsely distrib 


meEGEP (Fox) EGFD (Fot) EGFP (Fost) = EGFP (Font = EGE (Fox's EGFP (Fo) = EGFP Fit) = EGE (Foxit) 
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Representative whole-mount confocal image of ionocytes EGEP(Foxil) 
and ciliated cells (AcTub).c, Expression level of ionocyte markers 

(rovis ordered asin Fig. Sa, PDR <0.05 LRT, fal-length seRNA-eq 
dataset) in each aeway epithelial cell type d, EGEP( oxi)" onocytes 
extend cytoplasmic appendages (arrows). e-g, Immunofluorescence 
labelling of EGEP(Eosi1)* cells in airway regions. Submacosal gland 
(SMG, e) nasal respiratory epithelizm (0) and olfactory newroepithelism 
(g). Dotted line separates surface epithelium (SA) from SMG, 
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Extended Data Fig. 10| Functional characterization of ionacyts. 
a. Ascs(KO) moderately decreases ionoeyte wanscription factors and 
(fern AL.-eultured epithelia. Quantification (AAC) of expression in 
donocyte (Gf -0.82 AAC 95% Cl [20.20 Fax: -0.75 AAC, 95% 
[£0 28]; Asc: 10.28 AAC, 99% CI{-H1.85) and basal (Tp63) club 
(Segbtat) or ciated (Foyt) markers in heteo- and homozygous Ascl3 
KO (color legend) are nocenalized to wild-type litermates. The mean 
independent probes (pt and p2) was used for Cf n= 10 (Ase) 
5 (Ass) 4 (wild-type) mice values: Holm-Sak est ero bas 
9586 Clb, Altered ASL-rellectance intensity in Foxi(KO) ALL culture 
compared to wil ype. Representative }OCT image of ASL Red bar, 
‘uray surface liga depth (ncluding the peritiary and mucus ayers) 
Scale ba (white), 10 jc onocyte depletion or disruption doesnot 
fect ASL depth (eas determined by wOCT, nor pH (A) in cultured 
epithelia derived from homonygous Foxl(KO) (n™=9) versus wildtype 
Itermates(n=9 mice). Pvalues, Mann-Whitney U test, Increased 
‘Alen Foi1(KO) epthcli. A (y ai) in AL cultures of wild ype 
(Wh, heterozygous (HET) and Faxi(KO) mice (n=5 (WD). n= 
(HET), = 6 (KO)) that were characterize for their orskoln-inducble 
equivalent currents eq) and far cutents sensitive to CFTR 72 (0) 


‘The inhibitor sensitive A, values reported may underestimate the true 
inhibitor sensitive current, a the inhibitor response failed to each a 
steady platen for some samples during the time scale ofthe experiment. 
{g-i, Foil transcriptional activation (FoxiI~TA) in ferret increases Chir 
‘expression and chloride transport. g, qRT-PCR expression quantification 
(AAC?) ofionocyte markers in ferret Foxil-TA ALL (n=4 ferrets) 
normalized to mock transfection (Cfir:~139 ACs, 95% Cl 

Fosil: -5.37 SAC, 95% Cl [40.91]; Asc’: -087 ACs, 95% Cl 
[4027);Atpavod2: -1.18 AAC, 95% Cl [£0.58] and Aipévlel: -0; 
AAC 95% Cl 0.11}, P values -test; bars, means; error bar, 95% CL 
bi, Foil actvation in ferret cell cultures results in a CFTR inhibitor- 
sensitive short-circuit current (Al). Representative trace (h) and 
quantification (i) of short-circuit current (1) tracings from FaxiI-TA 
ferret ALL after sgRNA reverse transfection (n =6, ight blue) versus mock 
transfection (x =5, black).j,Evolutionarly conserved ionocyte signatures. 
Difference in fraction of cells in which transcript is detected and log: ol 
‘change between human fonocytes and all other bronchial epithelial cells. 
Labelled genes are differentially expressed (log; fold-change >0.25 and 
FDR <10-", Mann-Whitney Utes). Red consensus ionocyte markers 
between mouse and human (log; fold-change >0.25, FDR <10-, LRT), 
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Genetic and transcriptional evolution 
alters cancer cell line drug response 


Uri Ben-David! Benjamin Siranosian!, Gavin Ha!2, Helen Tang), Yaara Oren, Kunihiko Hinohara’, Craig A. Strathdee! 
Joshua Dempster!, Nicholas. Lyons!, Robert Burns, Anwesha Nag?, Guillaume Kugener!, Beth Cimini', Peter Tsvetkov!, 
Yosef E. Maruvka’, Ryan O'Rourke!, Anthony Garrity!, Andrew A. Tubell!, Prati Bandopadhayay'2", Aviad Tsherniak’, 
Francisca Vazquez}, Bang Wong!, Chet Birger', Mahmoud Ghandi!, Aaron R. Thorner*, Joshua A. Bittker!, Matthew Meyerson'2", 
Gad Getz, Rameen Beroukhim!?97* & Todd R. Golub!*™*7* 


Human cancer cell lines are the workhorse of cancer research, Although cell lines are known to evolve in culture, the 
extent of the resultant genetic and transcriptional heterogeneity and its functional consequences remain understudied. 
Here we use genomic analyses of 106 human cell lines grown in two laboratories to show extensive clonal diversity. 
Further comprehensive genomic characterization of 27 strains of the common breast cancer cell line MCF7 uncovered 
rapid genetic diversification. Similar results were obtained with multiple strains of 13 additional cell lines. Notably, 
genetic changes were associated with differential activation of gene expression programs and marked differences in 
ell morphology and proliferation. Barcoding experiments showed that cell line evolution occurs as a result of positive 
clonal selection that is highly sensitive to culture conditions. Analyses of single-cell-derived clones demonstrated that 
continuous instability quickly translates into heterogeneity of the cell ine. When the 27 MCF? strains were tested against 
321 anti-cancer compounds, we uncovered considerably different drug responses: a least 25% of compounds that strongly 
inhibited some strains were completely inactive in others. This study documents the extent, origins and consequences 
of genetic variation within cell lines, and provides a framework for researchers to measure such variation in efforts to 


support maximally reproducible cancer research, 


Human cancer cell lines have facilitated fundamental discoveries in 
cancer biology and translational medicine'. An implicit assumption 
has been that cell lines are clonal and genetically stable, and therefore 
that results obtained in one study can be readily extended to another. 
However, findings involving cancer cell lines are often difficult to repro 

duce’, leading investigators to conclude that the findings were ether 
weak or the studies not carefully conducted. For example, although 
pharmacogenomic profiling of large collections of cancer cell lines 
have proven to be mostly reproducible, some discrepancies in drug 
sensitivity remain unexplained*"!, We hypothesized that cancer cell 
lines are neither clonal nor genetically stable, and that this instability 
can generate variability in drug sensi 


Cross-laboratory comparisons 
To test the hypothesis that clonal variation exists within established 
cell lines, we reanalysed whole-exome sequencing data from 106 cell 
lines generated by both the Broad Institute (the Cancer Cell Line 
Encyclopedia (CCLE)) and the Sanger Insitute (the Genomics of Drug 
Sensitivity in Cancer (GDSC)), using the same analytical pipeline for 
both datasets (Methods) 

Asexpected, estimates ofthe allelic fraction of germline variants 
were neatly identical across the two datasets (median r=0.95), in 
cating that sequencing artefacts do not substantially contribute to the 
erroneous appearance of low allelic fraction calls. However, the degree 
of agreement in allelic fraction for somatic variants was substantially 
lower (median r= 0.86; P< 2 10-6 Fig, la, Extended Data Fig. 1a 
and Supplementary Table 1). Moreover, a median of 19% of the detected 
‘non-silent mutations (range, 10-90%) were identified in only one of the 
two datasets (Extended Data Fig. 1b). Similarly, 26% of genes that had 


copy number alterations (CNAs, which are also known as copy number 
variants) (range, 7-999%) were discordant (Extended Data Fig. 1e-e) 

These results indicate that genetic variability across cultures ofthe same 
cell line is common, Indeed, a median of 22% of the genome was esti 

‘mated to be affected by subclonal events across 916 CCLE cell ines 
(Extended Data Fig. If), suggesting that changes in subclonal compo 

sition may underlie the observed differences. 


Genetic variation across 27 MCF7 strains 
We performed extensive genomic characterization of 27 versions 
(hereafter called ‘strains’) of the commonly used oestrogen 
receptor (ER)-positive breast cancer cell line MCF7!=# (Methods, 
Extended Data Figs. lg-n, 2a,b and Supplementary Table2), including 
19 strains that had not undergone drug treatment or genetic manipu: 
lation, 7 strains that carried a genetic modification generally consid 
ered to be neutral (for example, introduction of a reporter gene, Cas9 
ora DNA barcode), and one strain (MCF7-M) that had been expanded 
in vivo in mice following anti-oestrogen therapy. Strain M was 
found to be an outlier, consistent with having been through strong 
bottlenecks, and was therefore excluded from downstream quantitative 
analyses. 

‘Ten chromosome arms (25% of the genome) were differentially 
gained or lost in a pairwise comparison of strains (Supplementary 
Table 3). We detected 283 genes with copy number gains and 405 
genes with copy number losses (compared to basal ploidy) in at least 
One strain, Only a small minority ofthese changes (13% of gains and 
21% of losses) were detected in all strains. Of these changes, 7% of 
gains and 13% of losses were detected in only a single strain, and the 
remaining events were observed variably across strains (Fig. 1b and 
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Fig. 1 | Extensive genetic variation across 27 strains ofthe cancer 
cell line MCE7, a, The distribution of pairwise allelic fraction (AE) 
correlations betwcen the Broad and the Sanger cell ines (n =106), for 
germline (black) and somatic (grey) SNVs. One-taled paired Wilcoxon, 
+ank-sum test , The number of gene-level CNAs shared by each number 
of MCE7 strains. Red, gains blue losses. e, CNAs of two genes, PTEN 
and ESRI. d, The numberof non-slent point mutations shared by each 
‘number of MCP7 strains. , The allelic fraction of inactivating mutations 
in the tumour suppressor PTEN.f, Top, unsupervised hierarchical 
clustering of 27 MCE7 strains based on CNA profiles derived from 
low-pass whole-genome sequencing. Orange, strain M subjected to in, 
vivo passaging and drug treatment; Blue 11 connectivity map strains 
cultured in the same laboratory without extensive passaging; green, 
‘rains D and E culteredin the same laboratory and separated by few 


Supplementary Table 4). The differential events included genes com- 
‘monly gained orlostin breast cancer (for example, P53, PTEN, EGER, 
PIK3CA and MAP2K¢; Extended Data Fig, 3a). For example, PTEN was 
deleted in 17 strains and retained in the other 10 (Fig. Le). Similarly, 
the oestrogen receptor gene ESRI was gained in 12 strains, lost in 6and 
‘unaltered in 9 (Fig. 1c) and this correlated with differential expres 
sion of ERa (P=0.009; Extended Data Fig 3b, cand Supplementary 
Discussion) 

‘Genetic Variation was similarly observed at the level of point 
‘mutations, small insertions or deletions (indels) and chromosomal 
translocations, Only 35% of 95 non-synonymous single nucleotide 
variants (SNVs) and indels that affected the coding sequence or 
splice regions were shared by all stains: 29% were unigue toa single 
strain, and the remaining were present ina subset of stains (Fig 1d, 
Extended Data Fig. 3d), Supplementary Tables 5,6 and Supplementary 
Discussion). Similar, albeit lower, variability was observed among 
mutations listed as recurrent in the COSMIC database", consistent 
with COSMIC mutations tending tobe clonal mutations ofthe found- 
ing populations (Extended Data Fig. 3). 
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Fraction tga ater) 
passages: purple, strains Iand K separated by Cas introduction. Bottom, 
‘corresponding heat map ofthe CNA landscapes ofthe stains relative to 
the median CNA landscape. Red, gains; bu, losses. g Top, unsupervised 
Injerarchical clustering of 27 MC? strains, based on their non-silent SNV. 
profiles derived from deep targeted sequencing. Coloursas inf. Bottom, 
‘crresponding heat map ofthe mutation status of non-silent mutations 
fcrass strains, Mutations that were identified in a subset ofthe stains at 
‘AF >0005 are shown. Yellow, mutation present; grey, mutation absent 

'h, Comparison ofthe magnitade of CNAs abserved following multiple 
freeze-thaw cycles (n=9: R, A and versus W, X and ¥), extensive 
passaging (n—=5; D versus L versus AA, B versus land P), and genetic 
‘manipulations (n=4: AA versus 0, B versus C.l versus and K). Bar 
median; box, 25th and 75th percentiles, whiskers, 1.5» IQR of lower and 
lupper quartile; circles, data points, Two-tailed Wilcoxon rank-sum tes. 


Unsupervised hierarchical clustering analysis, in which genetic 
listance was reflected by the branch lengths ofthe dendrogram, generated 
a branch structure that accurately reflected the history ofthe strains 
For example strain M, which had been subjected to in vivo passaging 
and drug teatment, was the most genetically distinc; the 11 strains 
used by the connectivity map project® over a 10-year period clustered 
tightly together; and sibling strains D and E, which were only a few 
passages apart, were the closest to each other (Fig If, gand Extended 
Data Fig. 3g). The genetic distance between strains appeared to be 
affected more by passage number and genetic manipulation than by 
{heeze-thaw cycles (Fig. Ihand Extended Data Fig.) 


Sources of variation 
‘Analysis of variant allelic fractions revealed extensive subclonality 
across strains (Fig. 2a, b and Extended Data Fig. Sa). For example, 
all 27 strains had a PIK3CA-activating mutation (G1633A), but the 
allelic fraction varied from 0.21 to 0.70 (Extended Data Fig. 5b). 
‘On the basis of allelic fractions and copy number status, 459% of all 
‘observed mutations were determined to be subclonal (P-<0.01 in a 


Fig. 2 | Genetic heterogeneity and clonal dynamics underlying genetic 
‘variation. a, Top unsupervised hierarchical clustering of27 MCI? strains 
based on the allelic fractions oftheir non-ilent SNVs, Colours as in Fig. 1 
Bottom, corresponding heat map ofthe allelic fractions of non-silent 
‘mutations present ina subset ofthe strains. b, The distribution of allelic 
fractions of non-silent mutations across strains , The cellular prevalence 
tof mutation clasters across MCE7 strains identified by a PyClone 
analysis, Mutation clusters with differential abundance (a difference in 
cellular prevalence (ACP) > 0.15), the clonal cluster (cluster 6;CP = 1 


binomial test). PyClone!”", which reconstructs subclonal structure by 
clustering mutations with similar cellular prevalence, found multiple 
subclones within each MCF7 strain, with varying abundance across 
strains (Fig. 2c). Indeed, for 43% of the non-silent SNVs, cellular 
prevalence differed by >50% across strains (Extended Data Fig, Se, d 
and Supplementary Table 7), 

‘We next investigated whether clonal dynamics were stochastic or the 
product of selection. We barcoded MCE? cll (strain D) and evaluated 
the change in barcode representation overtime under five culture con- 
ditions, each in five replicates. We reasoned that if clonal dynamics were 
stochastic, distinct barcoded populations would emerge in independent 
replicates. By contrast, if pre-existing subclones were selected under 
different conditions, enrichment of the same barcodes would be 
observed in replicate cultures", Unsupervised hierarchical clustering 
by barcode representation revealed that biological replicates clustered 
together (Fig. 2d and Supplementary Table 8), indicating that pre- 
existing subclones are indeed selected by changes in culture conditions. 

‘Next, we characterized the genetic stability of three wild-type single 
cell-derived MCF? clones and five single-cell-derived clones with a 
‘neutra genetic manipulation (stable expression ofa luciferase reporter; 
Methods, Extended Data Fig. 5e and Supplementary Tables 9, 10) 
Clones derived from the same parental population differed in their 
mutational landscapes: a median of 15% of the non-silent SNVs 
detected in the wild-type parental population (range, 13% to 16%), 
were not observed in their single-cell-derived progeny or vice versa 
(Extended Data Fig. 5f g). 

“Moreover the single-cell clones continued to evolve into heterogeneous 
populations. We propagated two clones for 8-14: months and sequenced 
their DNA at multiple time points (Supplementary Tables 8, 10) 
‘A median of 13% of the non-silent SNVs (range, 8-16%) were not 
shared between time points (Extended Data Fig. 5g). Similar results 
were observed based on cytogenetic analysis (Extended Data Fig. 5h-k 
and Supplementary Table 11), indicating that even single-cell-derived 
clones are genomically unstable. 
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{nall clones) anda luster unique to MCF7-M (cluster 12) are shown. 
= mutations per cluster, data are mean + seam. d, Top, unsupervised 
hhleratchical clustering of 27 samples of DNA-barcaded MCE7-D based 
‘on barcode representation, Dendrogram branches are coloured by culture 
condition. Bottom, corresponding heat map of barcode representation. 
ETP early time point; RPMI, RPMI 1640 medium; DMEM, DMEM 
‘medium; DMSO, RPMI 1640 with 0.05% DMSO; ESTDER,oestrogen- 
depleted RPMI 1640 medium: BORT, bortezomib (500 nM: 48h exposure) 
{allowed by RPMI 1640, 


Gene expression variation 
We next measured transcriptomic variation across the MCF7 
strains using the L1000 assay'*?*" (Supplementary Table 12) 
Despite an overall similarity in their global gene expression profiles 
(Fig. 3a and Extended Data Fig. 6a), the 27 strains also showed exten 
sive expression variation: 654 genes (median; range, 10-1,574) were 
differentially expressed by atleast twofold between pairs of strains 
(P<005,Q< 005), and the differentially expressed genes converged 
‘on important biological pathways (Extended Data Fig. 6b~d and 
Supplementary Table 13). Notably; the 27 strains clustered similarly 
in the space of mutations and expression profiles, and the expected 
downstream consequences of genetic mutations were observed in the 
gene expression variation (Figs. If, g, 3b-g, Extended Data Fig. 6e-i 
and Supplementary Table 14). For example, strains with inactivating 
PTEN mutations or activating PIK3CA mutations had decreased PTEN 
and increased asTOR gene expression signatures, respectively (Fig. 3e, 
and Extended Data Fig 6g-i). Similarly, copy number loss of ESRI was 
associated with reduced oestrogen signalling (Fig. 3g. and Extended 
Data Fig. 63) 

‘We further explored gene expression heterogeneity using single 
cell RNA sequencing of 26,465 individual cells from two parental 
and four single-cell-derived clones (Methods, Extended Data 
Fig. 6j-r and Supplementary Discussion). Unsupervised clustering 
showed that cells from the single-cell-derived clones did not 
cluster independently, but were mixed with the parental popula 
tion, indicating high similarity in overall gene expression (Fig. 3h 
and Extended Data Fig. 60). Notably, the extent of expression 
heterogeneity among the single-cell-derived clones was not sub. 
stantially lower than the heterogeneity of the parental population 
(Extended Data Fig. 6p),and increased with time in culture (Extended 
Data Fig. 6q-r, Supplementary Table 15 and Supplementary 
Discussion). These results indicate that variation in gene expres 
sion arises de novo, in addition to reflecting selection of pre-existing 
subclones? 
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Fig. 3 | Extensive transcriptomic variation associated with genetic 
variation. a, A F-distributed stochastic neighbour embedding (SNE) 
plot of gene expression profiles rom multiple samples of nine cancer cll 
lines. The 27 MCE7 strains profiled inthe current study are indicated 
byan asterisk in the key and are encircled in the plot b, Unsupervised 
bherarchicl clustering ofthe strains, based on their global gene expression 
profiles. Colours a in Fig. 1, Schematics ofthe analysis performed to 
evaluate the association between genetic variation and transcriptional 
programs. d, Arm-level gains and losses are asociated with significant 
‘up-and downregulation of genes transcrihed from the aberrant arms. 
¢ Gene-level CNAs are associated with significant dysregulation of the 
perturbed pathways, Por example, upregulation of mTOR signalling 
was found in strains that had lost a copy of PYEN. f, Point mutations are 
sssociated with significant dysregulation ofthe perturbed pathweays, For 
-cample, upregulation of niTOR signalling was found in strains with an 
activating PTEN mutation. g, Copy number loss of ESRI is associated 
‘ith significant downregulation of the oestrogen response. h. A-SNE plot 
of single-cell RNA. sequencing data from a parental population and three 
ofits sngle-cell-derived clones. sWT3-5, single-cell wild-type clone 3-5. 


Verification in additional cell lines 
‘To exclude the possibility that the variation that we observed across 
[MCE? strains was unique to that cell lin, we repeated genomic analyses 
(on 23 strains of the commonly used lung cancer cell line A549 
(Extended Data Fig 2c, dand Supplementary Tables 16-20). We observed 
a similar level of molecular variation across these strains (Extended 
Data Fig. 7). For example, loss of CDKN2A, the most significantly 
deleted gene in lung adenocarcinomas, was detected in5 strains, but a 
normal copy number was retained inthe other 18 (Extended Data Fig, 7) 
‘Whereas transcriptome analyses showed that oestrogen signalling 
‘was the most variable pathway in MCF7 cells (Extended Data Fig. 6¢ 
and Supplementary Table 13), KRAS signalling was the most variable 
pathway in A549 (Extended Data Fig. 7 and Supplementary Table 20), 
commonly used model of KRAS-dependent cancer: 

"The generalizability of our findings was further confirmed by deep 
targeted sequencing of multiple strains from 11 additional cell lines 
(Extended Data Fig, 8 and Supplementary Tables 21-24). Notably, 
genomic instability was not limited to transformed cancer cell lines 
(Supplementary Discussion). For example, the variation across 15 
strains of MCFIOA®, a non-transformed human mammary cell 
line, was as high asthe variation that we found in MCE7 cancer cells 
(median discordance, 26%: range, 17-40%; Extended Data Fig 8h). 


Functional consequences of genomic variation 
‘The extensive genomic variation across strains was associated 
with variation in biologically meaningful cellular properties. 
‘We examined several measures of basic cellular function, including. 
dloubling time and cell morphology, using quantitative live ell 
imaging” (Methods). MCF7 strains varied in doubling times by as much 
as 3.5-fold (median, 31h; range, 22-78 h; Extended Data Fig. 9a, b). 
Similarly, cll size and shape were highly variable across strains 
(Extended Data Fig. 9e-fand Supplementary Table 25). Clustering 
based on morphological traits was similar to clustering based on 
genomics or transcriptomics (Extended Data Fig. 9g), and genomic 
features correlated with proliferation (Extended Data Fig. 9h, iand 
Supplementary Discussion). 

Genomic instability also had major effect on drug response. 
‘We measured cell viability following treatment with 321 drugs ata 
single concentration (5M) across the 27 MCF7 strains (Supplementary 
‘Table 26). Of these, 55 compounds had strong activity (>50% growth 
inhibition) against at least one strain. However, at least one strain 
‘was entirely resistant (<20% growth inhibition) to 48 out of these 
55 (87%) active compounds (Fig. 4a, band Extended Data Fig. 10a). 
‘The same phenomenon was observed at a more stringent threshold: 
(of 42.compounds with strong activity in at least two strains, 33 (79%) 
‘were inactive in at east two strains (Extended Data Fig. 1b-d, j and 
Supplementary Discussion). All 33 differentially active compounds 
‘were validated in an eight-point dose-response analysis ofeach of the 
27 strains (median Spearman’ p = 0.42 between screens, P=3% 10" 
Extended Data Fig. 10k, Supplementary Table 27 and Supplementary 
Discussion). 

The high degree of variability in drug response cannot be 
explained by irreproducibility of the assay. First, replicate 
treatments yielded highly concordant results (median Pearson's 
1=097, P<2 10"; Extended Data Fig. 101) Second, compounds 
with the same mechanism of action had similar patterns of activity 
across strains (Fig 4a, c; P=3 x 10~”),For example, the same activity 
pattern was observed for three proteasome inhibitors (bortezomi 
‘MG-132 and carfilzomib; Fig. 4d), and was associated with bio- 
chemically measured differential proteasome activity (Extended Data 
Fig. 10m-o). Third, for 82% of differentially active compounds, we 
found differential gene expression signatures of the mechanism of 
action” of the compounds between sensitive and insensitive strains 
(P=2x 10"; Fig. 4e-h, Extended Data Fig. lOp-u and Supplementary 
“Tables 28,29), 

Indeed, drug response was associated with transcriptional differences 
in elevant pathways. For example, strains sensitive to CDK inhibitors 
had an upregulated cell cycle signature and strains sensitive to PI3K 
inhibitors had an upregulated mTOR signature (Fig 4f,gand Extended 
Data Fig. 10p, q). Notably, the strains that were the most resistant 
to treatment in general (strains M and Q) showed downregulation 
of drug metabolism pathways (Extended Data Fig. 10v). Differences 
in proliferation rate did not explain the majority of the observed 
differential drug activity (median Spearman's p= 0.017, P=0.60; 
Supplementary Table 30) 

Genetic variation could be linked directly to differential drug 
response. For example, genetic inactivation of PTEN was associated 
with decreased PTEN and increased AKT expression signatures 
(igs. 1e,e,3e,£), and increased sensitivity to the AKT inhibitor IV 
(Fig. 4h, i) Similarly, BSR/ loss was associated with reduced oestrogen 
signalling (Figs. 1c, 3g), which was in turn associated with reduced 
sensitivity to tamoxifen or oestrogen depletion (Fig. 4j and Extended 
Data Fig. 10w-x). More broadly, clustering of the MCE7 strains 
‘based on their drug response was highly similar to clustering based on 
genetics or gene expression (Figs. 1g, 2a 3b, da, Extended Data Fig, 11a 
and Supplementary Discussion). Genome-wide CRISPR screens 
showed that genetic dependencies were affected by genomic variation 
similar to pharmacological dependencies (Extended Data Fig. 11b 
Supplementary Table 31 and Supplementary Discussion), and 
functional analyses revealed that single-cell-derived clones remained 
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Fig. 4 | Drug-response consequences of genetic and transcriptomic 
variation. a, Top, unsupervised hierarchical clustering of 27 MCF7 strains 
based on their response to the 55 active compounds inthe primary sreen, 
Colours as in Fig 1. Bottom, corresponding heat map ofthe percentage 
of viability change for each compound across strains. Compounds are 
coloured based om their mechanism of action, b, Classification of the 
screened compounds based on ther differential activity: Consistent, 
viability change <~50% forall strains, variable, viability change <— 50% 
for some strains and > ~20% for other strains; intermediate, viability 
change in between these values. Comparison ofthe similarity in drag 
respon patterns between compounds that share the same mechanism of 
action (n=39) and compounds that work through diferent mechanisms 
(=1,439). One-tailed Wilcoxon rank-sum test d, Highly similar 
diferential drug response patterns for three proteasome inhibitors 
bortezomib, MG-152 and carfizomib. Each data point represents the 
mean of two replicates. The numberof data points per stain is mentioned 
in parentheses. The response pattern with no drug (DMSO control) 


phenotypically unstable (Extended Data Fig. 11g-iand Supplementary 
Discussion). 

‘We thus hypothesized that variation across otherwise isogenic strains 
mighthe harnessed to discover mechanisms of drug sensitivity and 
resistance. Indeed, we found that basal gene expression profiles across 
the 27 MCF7 strains could be more readily connected to the mecha- 
nism of action of active drugs than did larger panels of breast cancer cell 
lines derived from different patients** (Fig. 4k, Supplementary Table 32 
and Supplementary Discussion). 
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fs presented for comparison , Schematics ofthe analysis performed 
to evaluate the association between drug response and transcriptional 
‘ariatin. Upregulation of the KEG cell eyele signature in tains 
Sensitive to the cll cyele inhibitor alstrpaullone (8 sensitiv and 15 
‘esstant strain) . Upregulation of sTOR signalling in trans sensitive 
{othe PSK inhibitor SKM-120 (8 sensitive and rettan sin). 

1, Upregulation of the genes that are upregulated when PTEN is knocked 
dovon in srsne sensitive to AKT inhibitor IV (sensitive and 9 resistant 
Strains) i, Strains with PTEN mutation (n =12) respond more strongly 
1 AKT inhibitor V than strains without the mastation (1 11)-j Stains 
‘sth ESR copy number loss (n—5) grow better in astrogen-deplcted 
‘medium than stains without ESR? loss (n= 21). k, Comparison of gene 
sectenrichment analyis-based MoA identification betwoen the MCE 
Cohort and the CTD® (n—15) and GDSC (n —19) coors across matched 
rays Totaled Fishers exact test. Fo all box plots: bar, median be, 
2Sthand 75th percentiles whiskers, 1.5» the interquartile range ofthe 
lower and upper quar; ice, data points 


Discussion 
(Our results show that established cancer cell lines, generally thought to 
‘eclonal, are in fact highly genetically heterogeneous. This heterogeneity 
results both from clonal dynamics (that i, changes in the abundance 
of pre-existing subclones) and from continuous instability (that i, the 
appearance of new genetic variants). Moreover, genetic heterogeneity 
leads to varying patterns of gene expression, which in turn result in 
differential drug sensitivity. These findings have a number of important 
implications, which are summarized in Extended Data Table 1 
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‘We found that changes in clonal composition underlie much of the 
observed variability in cell line behaviour. Such clonal composition 
changes follow election by particular conditions (for example, growth 
medium) or by genetic manipulations associated with a population 
bottleneck. The genetic distance between cel line strains was strongly 
correlated with their gene expression distance and with their drug. 
response distance. Cell line diversification can therefore he estimated 
using inexpensive profiling methods (Extended Data Fig. 1) To facil 
itate routine assessment of cell line diversification, we have created the 
Cell STRAINER (stain instability profiler) portal (ups!//cellstrainer. 
broadinstitute.org), where users can upload cel line genomic dataand 
measure their stains genetic distance froma reference 

‘Variation within cancer cell lines can also be useful in at least 
‘two ways, Fits, deeper characterization (for example, by single-cell 
sequencing) ofthe heterogeneity within cultures of common cell lines 
could enable the study of cooperative and competitive interactions 
between cancer cell populations!" and mechanisms of pre-existing 
drug resistance”, Second, owing to their matched genetic background, 
naturally occurring ‘sogenic-like’ strains could help to uncover the 
association between molecular features and phenotypes such as drug 
response. 

‘We conclude that cancer cell lines remain a powerful tool for cancer 
esearch, but thei genomic evolation leads to a high degree of varia- 
tion across cell line strains, which must be considered in experimental 
design and data interpretation, 
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METHODS 
Data reporting. No statistical methods were used to predetermine sample sine. 
“Theesperiment were not randomized and the investiguoes were ot inded to 
allocation diring experiments and outcome asset. 

Cellcultore. The MCF, HT29, MDANAS3 and A375 cel lines were cured in 
MT 1640 (Life Technologie) with 108 ft ovine serum (Sigal) an 
19 penclin-strepomycin-gtamine (Life Technolog). The ASI, VCAF, PCS, 
FHCCS15, HepG2, Hets and Ben-Men-lcllines wee cllte in DMEM (Life 
“Technolgies wit 105 ft bovine serum Sigma-Aldrich) and 1% pencilin~ 
streptomycin-gtamine (Life Technologies). The HALE callline was cltared in 
‘MEM (Life Technologies), wih 10% fetal bovine serum (Sigma-Aldrich) and 
1% peniciln-sueptomyein-gltamine (Life Technologie) The MCFIOA cll, 
line was cultured in MEGM Marumary Epithelial Cll Geewth Medium (Lanza) 
supplemented with he MEGA Bullet (Lonz-The single-cell-drive clones 
Sc WTS scW'T and seWTS, a wells their parental MCF? popalation, were cl: 
{ured in DMEM (Lie Technologie), with 10% fel bovine seam Sigma Alch), 
1 penilin-srepomyein-gutanine (Lie Technologies) and 10pm insulin 
(Sigma-Aldsich). Calls were incubated 37°C CO and passaged twice a week 
sing Typsin-EDTA (0.254) (Life Technologies) All stuns ofthe sume cline 
vere cultred under the same conditions, ell entity was coniemed andthe cell 
‘ere confirmed tobe mycopasin fe, Calls wee tested for copa conta 
ination sing the MycoAlert Mycoplasma Detection Kit (Lonza), according othe 
‘manufacturers insractions. Celine lent was confirmed using SNP-based 
DNA fingerprinting (se blow) 

Derivation of single-cell les. The wld-ype single call- derived MCE? clones 
vere generated by cell sorting. Single cells were sorted int individual wells of 
5e-el pate, sing BD FACSArialI SORP Call Sorter Thre resultant closes 
were expanded fora period afappraximatl thee months before th experiments 
“The genetically sbaipulated single cell-derived MCF7 GREBI and MCF7 ESRI 
clone were generated sing CRISPR-Casd-medited genome enginecring tise 
‘ Nanolaicilerase reporter gene into the UTR ofthe respective genes, In bie 
“electble reporter gen casete was engineered using the ENV IRES clement 
dhiveexprsion ofthe destabilize NUac? reporter gene (Promega) fused othe 
N terminus ofthe BSR blastciin-resistance gene (vivogen) containing aP2A 
sel-deaving peptide lement. Fr targeting GREBI, the reporter gene caste was 
subcloned into a construct containing appeoximatly Kb ofthe GREBI genes. 
‘ounding the erssination con in exon 33 sch thatthe reporter gene castes 
Iocated Sop downstream ofthe GREBI termination codon nthe resulting mRNA 
hyd transcript. A GRE specific agRNA was generated that recognized the 
sequen GCTGACGGGACGACACATCTG on th sense sian sing PAM ste 
thats adjacent othe GREB! termination codon For targeting ESRI the reporter 
gene caste was subcloned into a conatect cotuning ppoimately 2b ESR? 
fone surmunding the termination cndon in exon 8, such thatthe reporter gene 
Cast lasted 21 bp downsteam ofthe ESR! termination codon nthe resaking 
hybrid mRNA transcript. An ESRY-speciicagRNA was ynerted hat ecognized 
thesequence GICTCCAGCAGCAGGTCATAG on the at-sens stand, nga 
PANite thats Db upsteam ofthe ESRI termination codon. Coresponding 
(as9sgRNA ad targeting construct pais were transiently talc ino MCE? 
cessing Lipofectamine 2000 (Thera Fisher Scientic)-After roth fo seven 
day, the ell were cultured in medium contining Sy ml blasticidin to select 
forthe dsined recombinants, Single-cell clones were thn slated by limiting 
dition single-cell ning in 96-wel plates 

Grove ate analysis Cells were seeded in wiplicates in white, cear-botom, 
9 plates (Corning, 3903), ata density of 000 call per well. Plates were 
incubated in an Incujte ZOOM instrument (Esen Bosclence) at 37°C, 55 
CO;- Four non-overlapping phase-contrast images (10x) were taken every 2h 
foraiotaof 16. ncuCyte 200M software (version 20154) was used cle 
Inte the mean confluence per well at each time point (tered vo exclude objects 
smaller than 1002) and averaged across wells to calculate the mean con 
nce er strain. Doubling times were calculated for cach tin, asing the formula 
hig (log 7)ilg()~ lot) n which and were the miss and 
‘matmum percentage conoency during thelacar growth has respectively. and 
‘ATwasthetime elapsed between dy. To account or potential dilerente in 
cell ecover following seeding, Oh was defined athe fst ine point in which 
the mean stain confluency surpased a thresbold of 15% To examine the effect 
of estrogen depletion onthe goth of MCF? strains, cells were cultured ether 
in standard conditions (described above) orin oestrogen depleted conditions 
PMI 1640 without phenol red (Life Tecnaloge) with 10% charcoal pp 
fetal bovine serum (ie Seences and 1 penklin-sreptonyeln-gaamine (Life 
“Technoogie). Comparson betwen standard and oestogen-deplted cnditons 
wa performed by calculating the fold change in doubling ime between the two 
sonitions, 

Call patting els were pated in wiplzate at a densty of 1,000 cel pee wel, 
and then stained and fixed as previously described" Images were aken ona 
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Perkin-Elmer Opera Phenix microscope with s20%/1LONA water immersion lens. 
Image quality control was caved out as previously described, using CllProfiler™ 
and CallProfler- Analyst For al 7 MCF7 stains, the majority of images in all 
three wells pasted quality control, nd therefore al stains were further considered. 
{image illumination correction and analysis were performed in CllPofiler. For 
‘eachofthe 27 MCF7 strains, the median value ofthe 1,744 measured features was 
computed and used for hierarchical ustering 

DNA and RNA extraction, Genomic DNA was extracted using the DNessy Blood 
{Tissue Kit (Qiagen), according tothe manufacturers protocol. Total RNA vas 
extracted using the RNeasy lus Mini Kit (Qiagen), according to the manufac- 
turers protocol 

DNA fingerprinting Fingerprinting analysis was performed using 44 palymor- 
phic lc. GenorypeConcordance’ (Picard tools) was used to calculate the concord- 
ance between each pai f samples (separately forthe MCF7 and A549 cohorts). 
Samples with >95% concordance were called a match, 

Ultra-low-pass whole-genome DNA sequencing. Copy number characteriza- 
‘ion was performed using low-pass (approximately 0 2x coverage) whole-genome 
sequencing, Libraries wete prepared frm 50g of DNA using TaruPLEX-DNAseq 
sample preparation kits (Rubicon Genomics) according tothe manufacturer’ pro- 
tocol. The resultant libraries were quantified using a Qubit uorometer (Agilent 
“TapeStation 2200) and RT-qPCR using the Kapa Library Quantification kit (Kapa 
Biosystems), according tothe manufacturer’ protocol. Uniquely indexed libraries 
were pooled in equimolar ratios and sequenced on a single lumina NextSeq500 
‘un with paited-end 35-bp reads, at the Dana-Farber Cancer Institute Molecular 
Biology Core Facilites, The reads were aligned to the UCSC hgl9 reference 
‘genome, using BWA-MEM (0.0715), wth default parameters 

Uitra-low-pass whole-genome DNA-sequencing data analysis. The chor CNA 
algorithm® yas applied to identify CNAs af large genomic segments, chromo- 
Some arms and whole chromosomes. First, the enome was divided into 1-Mb 
bins and read counts were generated foreach bin using the HMMIcopy Suite 
(htp//compbio beer calsoftware/hmmcopy)). The raw read counts were then 
normalized to corect for GC content and mapability biases using the HMMcopy 
package”, generating corrected read counts fa each I-Mb bin. Segmentation 
sand copy number prediction foreach sample were performed using ichorCNA 
\20.1.0 (htps://github.com/broadinsttuteichorCNA), which is optimized 
for low-caverage whole-genome sequencing. Parameters were initialized 
based on prior knowledge normal =1.0, ploidy =c(3,35,4),-tanE=0.99999, 
“tenStength — 10,00, -maxCN — Remaining parameters were set othe default 
Forbin-Level comparison between strains, we used the log-transformed corected 
read counts and determined gain and loss sta using thresholds of 0.1 and ~0.1, 
respectively For arm-level calle, th copy number status was determined based on 
the largest overlapping segment. 

Deep targeted sequencing. Deep (approximately 250 coverage) targeted exon 
sequencing of 447 genes that are commonly mutated in cancer was performed 
(Profile OncoPanel v3). Prior to library preparation, DNA was fragmented 
(Cova sonication) 9250p and further puied using Agentcourt AMPure XP 
beads. Sie-slected DNA was ligated to sequencing adaptors with sample-rpecific 
barcodes during automated library preparation (SPRIWworks,Beckman-Coulter. 
Libraries were pooled and sequenced onan lumina Miseg ta estimate library 
concentration based on the number of index eads per sample. Library construc- 
tion was considered toe sucessful ifthe yield was >250 ng, and all samples had 
sullicintly high yields. Nermalizd libraries were pooled in batches. and hybrid 
Capture was performed using the Agilent Sureselect Hybrid Capture kit with the 
POPy3_824272 bit set". The list of 447 genes included in POPY3_824272 is 
provided as Supplementary Table 2. Captures were then pooled and sequenced 
fn one iSeq3000 lane. Pooled sample reads were deconvoluted and sorted using 
Picard tools (tp//broadinstitutegithub.o/pcard). The reads were aligned tothe 
reference sequence b37 edition fom the Human Genome Reference Consortium 
using "bwa als (hip: /bio-bwa.sourcelorgenet/bwa shtml), with the fllowing 
parameters -q 5-132-K 2-01, and duplicate reade were identified and removed 
‘sing Picard tools. The alignments were further refined using the GATK tool 
forlocalized realignment around indel sites (htps:/soRware broadnsttte org) 
{gatk/documentation/tooldocs/curentorg_ broadinsttte_gatk tools-walkers- 
Indels_IndelRealigner php). Recalibration of the quality scores was also 
pesformed using GATK tol hp gatkorunss hroadinstituteorg/discussion/44) 
base-quaity-scre-recalibration-bqse)"=". Metrics forthe representation ofeach 
sample inthe pool were enersted onthe unaligned reads after sorting on the 
barcode (tp /broadinstitute github picardpicard-metric-defiitions tal). 
Allsumples achieved the CCGD recommended threshold of >30% coverage for 
Sis ofthe targeted bases. Average mean exon target coverage was 251.5% (ange: 
1715-33627») forthe MCE samples, 2889 (range 2082-3989) forthe ASI9 
samples and 25732 (range 211.7-442.68>) for the additonal celine samples, 
“Targeted sequencing data analysis, Mutation analysis for SNVs was performed 
using Mut v.14 Indel calling was performed using the SomaticindelDetector 
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tool in GATK (bitp://www.broadinstituteorg/cancer/cga/indelocator) 
(Consecutive variants in the same cndon were rannotted to maximize the effect 
‘on the codon and marked as‘Phase variants. MuTect was run in pared mode, 
Pairing the MCF7 or A549 samples toa normal sample, CEPHL408, Mutations 
were called if detected in >2% ofthe reads (AF > 0.02). All NV, indels and 
[Phased variants were annotated with Variant fect Predictor”. Variants were fil- 
{ered against the 6 500 exome release ofthe Exome Sequencing Project database 
‘Variants that were represented more than once in either the African of European 
American populations and were found ess than twice in COSMIC were considered 
tebe germline (given that no matched normal samples were avaiable). A germline 
flee was not sppled tothe downstream analyses, as changes in such mutations 
between strains ofthe same cell line would have to arise in culture and may be 
functionally relevant, Non-slent mutations were considered tobe those withthe 
following Hestfet Variant Classification: missense, inistor codon, nonsense, 
spliceacceptor, spice donor splice region, ameshif.inftame insertion or inirame 
deletion, Mutations that appeared mare than once ia COSMIC were regarded se 
(COSMIC mutations The complete lists of variants (SNVs,indels and phased) 
‘MCF7, A549 and additional el ins are provided in Supplementary Tables 5,17 
and 23, respectively 

(CNAs were identified using RobustCNY, an algorithm that relies on localized 
changes inthe mapping depth of sequenced reads inorder to identify changes 
In copy number a the sampled loci (M. Ducar eta. manuscript in preparation). 
Systematic bias in mapping depth was educed using robust regression, fting 
the observed tumour mapping depth agunst a pane f normal simples captured 
using the same bat set- Observed values were then normalized aginst predicted 
‘values and expressed as lg: ratios. A second normalization step was performed 
to remove GC bias, sing a LOESS fit. log aioe were centred on segments that 
‘were determined tne diploid based onthe allele fraction of heterozygous SNPs in 
the targeted pane. Normalied coverage data were next segmented using Circular 
inary Segmentation" with the DNAcopy’Bioconductor package. Finally, eg- 
ents were assigned gain, loss or normal-copy call sing col derived from the 
‘within-segment standard deviation of post-normalized mapping depths. Owing 
to the high data quality and low within-segment standard deviation, acuta of 
scound 0.1 was aplied tall simples. Segment calls were sunmarized to gene 
‘alls by assigning them to capture intervals, and then counting the interval all 
for each gene. Gene lve calls were determined according to the following rules: 
‘gain "+! calls >50%% los’ ~"~"clls>2 tin 100%; gain + lose'—"~"calls>2 
{umes and call <50%; mixed! "+ and’ callsin the same gene, but below 
threshold: normal'="+" calls but below threshold; "normal-'="~' calls, but 
below threshold; ‘Normal’ =no "+ or’ calls. The complete lis of CNAS for 
[MCF7, A549 and additional al ins are provided as Supplementary Tables 4,16 
‘nd 22, respectively 

Fora subset of 60 genes (listed in Supplementary Table 2), rearrangements 
(structural variants) were detected using BreaKmer", which is designed to detect 
larger genomic structural variations fram single-tample-aligned short-read 
target-captured high-throughput sequencing data. n brit, the method extracts 
‘misaligned’ sequences froma targeted region, such as split reads and unmapped 
mates, assembles a contig fom these reads, and ve-aligas the contig to make a 
‘variant call classifies detected variants asinsertions(deletions, tandems dupl- 
‘ations inverson® and translocations. The complete lst af structural variants 
for MCF7 and A349 are provided in Supplementary Tables 6 and 18, respectively. 
Rearrangements were visualized using the Circo visualization tool” 
CClonality analysis. To resolve clonal dynamics and composition, we applied the 
yClone algorithm w0.13.0(tps:/bitbucktory/aroth85/pyclonewiki/Home) 10 
the measured allelic fractions, accounting for copy number, loss of heterozygosity 
snd cellularity”, PyClone enabled us to fallow clonal dynamics throughout the 
‘evolution of el populations" For copy number input, we used rests from 
JchorCNA segmentation and copy number predictions, Mutations with <50 read 
depth were excluded. The following parameters were sed for Pyne: 10,000 
‘erations, 1,000 burn-in, otal copy-number for the prior. We als performed 
‘ult-sample analysis using PyClone, to determine the changes in clonal com- 
position across strains. For the mult-sample analysis, mutations were sdected as 
the union set across ll 27 strains. The same parameters were ased for PyClone 
sulti-sample analysis as forthe indvidual-sample runs. 
DNA barcoding experiment. Degenerate oligonucleotides for sgRNA-barcode 
library construction were synthesized by IDT and cloned into letiGulde-Puro™ by 
Gibson assembly. a previously described. Approximately 300) of Gibson prod- 
uct wastransformed into 25, of Endura elecrocompetent els (Lacigen). Altera 
[Chrecovery period, 0.1% of transformed bacteria were plated ina tenfold dition 
series on ampicilin-contaning plates to determine the numberof sucessful trans 
formant. The remainder othe transformed bacteris were cultured in SOmal of LB 
with 50)4/ml- ampiclin for L6hat 30°C. Plasmid Ubraries were extracted using 
Plasmid MidiPls kit (Qiagen) and sequenced toa depth of 62 milion reads 
ona Illumina Miniseq, corresponding o 6x coverage of >! milion barcodes. 


Lentivirus was prepared by transfecting a total of 10 millon HEK293ET cells a¢ 
previously described®. The MCF7-D strain was cultured in standard conditions 
(described above) and four million cells were infected with alow mauliplicty 
of infection (20-30%) to reduce the probability f each cell being infected with 
‘more than one barcode. Calls underwent puromycin selection andthe final ell 
‘pool contained approximately 160,000 unique barcodes. Cells were expanded for 
‘the experiment, and five milion cells wee then plated into 25 issue culture asks. 
Five culture conditions were then applied (with ive replicates pe condition: (1) 
[RPMI 1640 (Life Technologies) with 10% fetal bovine serum (Sigma-Aldrich) 
and 1% penicilin-streptomycin-glutamine (Life Technologies (2) DMEM (Lite 
‘Technologis) with 10% fetal ovine serum (Sigma-Aldrich) and 1% penicilin~ 
streptomycin-glutamine (Life Technologies); (3) RPMI 1640 without phenol red 
(Life Technologies), with 10% charcoal-stipped fetal bovine serum Life Sciences) 
and 1% penicilin-steptomycin-glutamine (Life Technologies); (4) RPMI 1610 
(Cite Technologies) with 10% fetal bovine serum (Sigma-Aldrich), 1% penclin~ 
streptomycin-glutamine (Life Technologies) and 0.05% DMSO (Sigma-Aldrich); 
(6) RPMI 1640 (Life Technologies) with 10% fetalbovine serum (Sigma-Aldrich) 
and 1% peniclin-streptomycin~gltamine (Life Technologies), supplemented 
for the fist 48h with S00nM bortezomib (Sllckchem, 1013). After five weeks 
‘of culture, DNA was extracted and barcode abundance was assessed by DNA 
sequencing, as previously described". Libraries vere sequenced toa median depth 
(of 42 milion eads corresponding toa barcode coverage of >26% 
‘Transcriptional profiling with L1000, The L1000 expression- profiling assay" 
was performed as previously described! First, mRNA was captured from cell 
Jysate using an oligo dT-coated 364-wellTurbocaptue plate. The ysate was then 
removed, and a reverse-trancription mix containing MMILV vas added. The pte 
‘was washed anda misture containing both upstream and downstream probes for 
tach gene was added, Each probe contained a gene-specific sequence, along with 
‘universal primer sit. The upstream probe also contained a microbead-specific 
barcode sequence. The probes were annealed tothe cDNA over ah peti, and 
‘thea ligated together to form a PCR template. After ligation, Hot Stat Taj and 
‘universal primers were added tothe pate The upstream primer was biotinylated 
tall later stuining with streptavidin-phycoerythrin, The PCR amplicon was 
‘thea hybridized to Luminex microbeads via the complimentary, probe specific 
barcode on each bead. After averight hybridization the beade were washed and 
sane with steptavidin-phycoeryhrin to prepare them for detection in Lumines 
FlexMap 3D scanners, The scanners measured each bead independent and 
‘reported the bead colour and identity and the fuorescence intensity of the ea, 
‘Adeconvolution algorithms converted these raw Ihorescence intensity messure- 
‘ments into median fluorescence intensities for each ofthe 978 measured genes, 
producing the GEX level data, These GEX data were then normalized based on an 
[variant gene st, and then quantile-normalized to produce QNORM level data 
‘An inference model was applied to the QNORM data to infer gene expression 
changes for total of 10,174 features, Per-stran gene expression signatures were 
‘alulated using a weighted average ofthe replicate, for which the weights ate 
‘proportional tothe Spesrman corelation between the replicates, 
‘Transcriptional profiling data analysis. To examine how newly peofiled MCE? 
and AS19 cells compared in gene expresion toa previously acquired collection of 
call line profiles (untreated samples that served as controls for connectivity map 
perturbational experiments), we used SNE. Profiles were restricted to untreated 
profiles from the nine core connectivity map cell lines, and to batches with 
‘multiple untreated profiles Because samples were frst clustered based on thei 
project codes, batch effect was next removed using the COMBAT algerithm* 
"ESNE analysis was appli tothe batch-corneted data snd visualized using ascater 
plot Analysis was completed using the ‘Rtn package version 0.13" For the 
‘comparison of transcriptional variation across the nine core connectivity map 
‘elllines, the callection of untreated profiles generated with the L1000 assay was 
‘sed, Five profes from each celine were randomly chosen, and the expression 
‘variance of the 978 L1000 landmark genes ws calculated foreach cell line For the 
‘comparison of L1O00 gene expression data to the Cancer Cell Line Encyclopedia 
(CLE) gene expression profiles, RNA-sequencing (RNA-seq) and AAiymetrix 
gene expression profiles were downloaded from the CCLE website (hips!/porals 
broadinstituteorg/cle/data) Data within each platform were processed using 
‘nari set scaling, which adjusts profs accoding the expresion ofS variant 
genes followed by quantile normalization, The ranked gene expression order of 
the 978 landmark genes was compared using a Spearman correlation, 
(Chemical screening. MCF? strains were tested ageinst a small-molecule 
Informer set library of 321 anti-cancer compounds, assembled by the Cancer 
‘Target Discovery and Development (CTD: https//ocg-cancergov/programs) 
td2/data-portal), using the same principles as those described inthe Cancer 
“Therapeutics Response Portal", The list of ercened compounds is included a¢ 
Supplementary Table 26, Cells were seeded in their culture medium in white, 384- 
swell plate (Corning, 3570 ata intial density of 2500 calls per well and incubated 
‘overnight at 37°C, 5% CO,. The next day, 25 (or primary screen) or 100] 


(for confirmation dose-response screen} of compound stocks in DMSO vere 
added by pin transfer. Plates were incubated for 72h, cooled at room temperature 
for 10min, and viability was measured using the CellTiter-Glo luminescent cell 
abit assay (Promega), according to the manufacturers protocol After 1Omin 
of incubation, luminescence was read on a Perkin Elmer Envision reader, ata 
speed of 1s per well. 

(Chemical screening data analysis. Data were analysed in Genedata Screener 
versian 13.0, using the normalization method ‘neural controls for which the 
median ofthe 32 DMSO wells on each plate was st to 0% activity and a signal 
as set to ~100%. Positive controls (20,M MG-132  20,M bortezomib) were 
Included on al plates (16 wells each) but were not used for normalization owing 
to variability inthe response actos cll lines. Dose-response curves were fit using 
the Smart Fit strategy in Genedata, The percentage of effect was defined asthe 
high-concentraton asymptote (snf) and the EC. was the concentration at which 
the ited curve crossed the inhbitary value representing half of the manimal eft 
(68). addition, parameters were calculated at which the curve crossed abso- 
lute inhibitory values of 30% or 50% regardless of maximal effect (AbsEC3 and 
ABSECGy respectively) AUC calculations were performed as previously described 
‘curves were it with nonlinear sigmoid functions forcing the lowe concentration 
‘symptoteto | sing a three-parameter sigmoidal curve ft The AUC foreach 
‘ompound-sttain pair was calculated by numerically integrating under the eight: 
Point concentration-response curve. For visualization purposes, drug esponse 
‘urves were fit with a four parameter loglogistic function, based on normalized 
‘ably data trom which th lowest dose ably had been subtracted. Pots were 
fenerated using the°LL.1” function inthe ‘re’ package (dips!/cran.-prject. 
‘org/web/packages/dec) To examine a potential link between proliferation rate 
and differential drug response, doubling times were compared against the AUC 
‘ales ofthe 3 dtfrentily-ative compound, 

Gene set entichment analysis. Get set enrichment analysis (GSEA) vas per 
formed using the 10147 genes best inferred from the connectivity map linear 
‘model also known asthe BING gene set. Samples were divided into two classes 
‘depending on the comparisons being made: samples with a genetic alteration 
‘vetsus samples without i; samples sensitive toa drug (>30% inhibition) versus 
samples insensitive to the same drug (<20% inhibition). Differential expression 
was calculated using the signal-to-noise metrc™. A ranked gene list and signal 
to-noise values served asthe input for the GSEA preranked module of GSEA, 
using the Java app version 3.0. The analysis was run using the ‘hallmark. KEGG; 
‘positional and tncogenic signature collections from the Molecular Signa 
Database (MsigDB)” (hp/software breadinstitue org/gsca/msigdb)-To com: 
pate hetween our MCE7 panel, CTD* and GDSC, drug esponses were downloaded 
from the CTRP website (hpsy/ocg-cancergoviprograms/ctd2/data-portal: 20. 
data curves_post_q ile, updated 14 October 2015) and from the GDSC webs 
‘ste (hitpy/wwyrcancerragene.org/downloads log(IGz) and AUC values fle, 
updated 4 Jly 2016). Expression profiles were downloaded ftom the CCLE web- 
site to match the CTD® drug-response data (htips//portals broadinstitoteorg/ 
cclefdata-‘CCLE. Expression Entrez_ 2012-09-29 gc updated 17 Ociober2012), 
tnd from the GDSC website to match with the GDSC drug response data (hp!) 
vwovecancerragene og/ downloads; RMA normalize expression data for cell lines, 
Updated 2 March 2017) Expression proiles were filtered to inlude only the genes 
thatbelong to the L1000 BING set GSEA compared the expression patterns af the 
five strains or cell lie withthe highest AUC valves foreach matched drug withthe 
five strais/cll lines with the lowest AUC value for that drug. As the robustness 
tf yen expression signatures varies, this quantitative 

50 well defined hallmark GSEA gene ste” 
‘Single-cell RNA-req. MCF7 cells were cultured as described above. To fllawe 
twanscriptional changes after drug treatment, MCFT-AA cells were exposed 10 
‘500 nM fbortezomib (Slleckchem, $1013) and collected before treatment, after 
[Dh ofesposue (after 48h of exposure (or after72haf exposure followed 
by drug wash and 24h of recovery (hy) Cells were washed, trypinized, passed 
though 240m cellstrainer, centrifuged a 0g and resuspended ata concentration 
of 1,000 cells per pl in PBS containing 0.5% BSA. Single cells were processed 
through the Chromium Single Cell 3’ Solution platform using the Chromium 
Single Cell Gel Bead, Chip and Library Kits (10X Genamies) per the manu 
facturers protocol. In brit, 7000 ces were added to each channel and were then 
partitioned into Gel Beads in emulsion in the Chromiam instrument, where cell 
Iyss and barcaded reverse transcription of RNA accutred, followed by ampli 
fication, shearing and 5' adaptor and sample index attachment. Libraries were 
sequenced onan Illumina NextSeq 500. 

Single-cell RNA-seq data analysis. Reads were mapped to the GRCH34 human 
transcriptome using cell ranger version 2.1.0, and transcript-per-milion values 
were calculated fr each gene in each tered cell bstcoded sample. Transcript per 
nillion values were then divided by 10 since the complexity of single-cell braries 
isestimated tobe inthe order of 10,000 transcripts For each cel, we quantified 
the numberof exprested genes and the proportion ofthe transcript counts derived 


ARTICLE 


from mitochondrial genes. Cells with either <1,000 detected genes of >0.15 
mitochondrial fraction were excluded frm further analysis Finally the resulting 
expression matrix was filtered to remove genes detected in <3 call. We focused 
‘on highly variable genes for downstream principal component analysis (PCA) 
For each dataset, we used the Seurat™ (htp/satijlah org/seura)R package to 
detec variable genes based on iting a relationship hetwsen the mean and the dis- 
persion afeach gene. Wenext scaled the data and regressed out unique molecular 
leniication number and mitochondrial gene fraction to remove technical nie. 
“The resuking scaled data were used as an input for PCA. Top significant principal 
components, estimated by a manual inspection ofthe PCA standard deviations 
‘elbow plots were used to generate -SNE plots. Fr each dataset, we used Seurat™ 
(Gttp/satialah rg/seurat) wo identity genes that vary between Samples. To detect 
differentially active pathways, gene ontology (GO) enrichment analysis was per- 
formed with MSigD (hip /sofware broadinsiuteorg/gsea/msigdb) using 
the differentially expressed genes that passed the fllowing thresholds: log fold 
change)|>0.5, Bonferroni-coreected P< 0.01, the gene was detected in >108% of 
the cells in each ofthe compated groups. Expression signatures for selected path- 
‘ways were downloaded from MSigDB”” We evaluated the degre to which indi- 
‘vidual cells express a certain expression signature by using a procedure that takes 
into account the variability in signal-to-noise ratio, as previously reported, To 
calculate pairwise cll distances, variable genes were detected, andthe cell embed- 
ding matrix forthe top significant principal camponents was used tcaculate the 
Euclidean distance between every two ces within each sample 

Analysis of genome-wide CRISPR screens. CERES dependency scores were 
obtained fom the Broad Institute Achilles website (ttps:/portals roadinstiute. 
orgachilles/datases/18/download). Owing to an unusually lrg difference in 
screen quality between MCF7 and KPLI, the subtle differences in dependency 
status between these lines were dominated by effects related to sereen quality. To 
emove these uninteresting sources of variation, we corrected CERES gene scores 
by removing their fis six principal components. These components were well- 
‘explained by experimental batch effects related to screen performance and plasmid 
DNA pool. Crrected dependency scores <0.5 were defined as dependencies. 
Genes listed as pan_ dependent in the orginal dependency dataset were excladed 
ftom further analysis. Fra more stringent overlap comparison genes with CERES 
scores between —0.4 and ~0.6 in MCE? of KPLI were further excluded. To 
Implement thefrce-drected layout, described in Extended Data Fig, 11h the fll 
corrected dependency matrix was reduced tits top 10 principle components 
anda k-means clustering algorithm was run repeatedly an cll ines. Here, isthe 
‘numberof clusters, and the mean cluster size (numberof cell lines) divided by k 
isa parameter similar to perplesty in SNE, set to 6 for our data Edges between 
cells were weighted according tothe frequency with which they clustered together, 
with edges appearing les than 3% ofthe time ignored. Cells were then lad aut 
Using the SFPD spring.bock algorthn Celine RNA-seq gene expression data 
and reverse-phase protein array protein expression data yer obtained from the 
CCCLE website htps:/portalsbreadinsttuteong/cleldata.Single-sample GSEA 
was calculated using the ssGSEA algorthas" 

(Caymotrypsin-like activity. ICE? cells were plated in triplicates in 96-well plates 
at density of 2,000 cells per well. After 24h, chymoteypsin-like activity ofthe 
proteasome was assayed, using the Protcasome-Glo assay (Promega), according 
tomanufacurers protocol, The activity levels were normalized tothe relative cell 
‘umaer that was measured using the horescent detection of resazurin dye reduc- 
tion (5#-nm excitation and $90-nm emission), 

Western blots, For PSMC2 and PSMD2 immunoblotting, cells were Ised in 
ENG bufer(50mM HEPES-KOH pH17.9, 150 mM NaCl, 2mM EDTA pH 80, 
20mM sodium molybdate 0.5% Triton X-100, 5% lycra), with proteaeinhibitor 
cocktail (Roche Diagnostics, 11836153001), Protein concentrations were deter- 
‘ined by BCA assay (Thermo Fisher Scientific, 23227) and proteins were resalved 
using SDS-PAGE for immunoblot analysis Antibodies agains the following 
human proteins yere used c-tubulin (ab80779, Abcam), PSMC2 (MSSI-104, Enz 
[Life Sciences) and PSMDI (C-7, Santa-Cru) Visualization was performed using 
the ChemiDoc MP System (Bio-Rad) and Imagel ab Softwate (Bio-Rad) was used 
to quantify relative band intensities. For ER immunonblotting, cells were lysed 
With amis of 4 protein loading baile (Li-Cor, 928-0004) and 10> NuPAGE 
sample reducing agent (Life Technologies, NPOOOS). Protein concentration vas 
‘normalized by cel counting and proteins were resolved by SDS-PAGE. Antibodies 
against the fllowing human proteins were used:-actn (N-21, Santa Cruz), ER 
(10, Santa Cruz). Visualization was performed using the Odyssey CLx imaging 
‘machine (Li-Cor) and Image Studio Software (L-Cer) was used t quantiy the 
relative intense, 

Generation and comparison of dendrograms. Dendrograms were constructed 
using Euclidean distances for continuous measures and Manhattan distances for 
discrete measures, Compete linkage hierarchical clistering was performed inal 
tases. The mutation status dendrogram wus based on mutations with AF >005. 
“The gene expression dendrogram was based on the 978 landasark genes diectly 
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scasured bythe L100 asa The copy number dendrogram were based on dis- 
crete calls (loss, normal or gain) asigned to each event based on its log: copy 
‘umber ratio, using a cutoff value of 0.1, The drug- response dendrogram we 
brsedon normalized viability values, The cell morphology dendrogram was based 
‘on tefl list ofthe 1754 measured cellular features. The barcode representation 
dendrogram ws based on the log, transformed numberof reads, ncding only 
barcodes with >1,000 reads ina least one sample, To understand how dendro- 
{grams from different sources compared, the Fowikes-Mallows index was used. 
‘sit could capture similarities in global castering while ignoring within-group 
‘variance™ The BK’ function in the dendestend R package was used for com- 
putations and visualizations. We compaced dendrograms frm diferent sources 
‘vith values ranging fom Sta 26. A background distribution was calculated by 
‘andomlyshufling the labels f the tees 1,000 times, and calculating Bk values. 
‘The 95% upper quantile ofthe randomized distribution foreach k was plotted. 
‘Themaxioum Bk value was used tn estimate the degree of similarity betwen the 
compared pair of dendrogram. 

Calculation ofthe distances between strains based on their genomic features. 
(GNA distance based on ules low-pass whole genome DNA sequencing was deter 
‘ined bythe fraction ofthe genome affected by discordant CNA calls. CNA and 
SNV distances based on targeted sequencing were determined by Jaccard indices, 
defined a the number of shared events between strains (intersection) divided by 
the total numberof evens in these stains (union) For SNVs, both the mutated 
gene and the exact aminoacid change had tobe identical to be counted asa shared 
‘rent, Gene expression distances were defined as the Euclidean distances between 
1.1000 expression profiles Drug-rexponse distances were defined as the Euclidean 
dlitances between drag-response profiles, fe limiting the drugset to active drge 
‘only (thats drugs that reduced the viablity oft leas one strain by >50%) and 
Setting the threshold for viability values to £108. 

‘Comparisons across CCLEcelllines. Gene level mRNA expression, copy aum- 
ber and mutation status data were downloaded from the CCLE website (https// 
portalsbroadinstiuteorg/cele/data: CCLE_Expression Entrez 2012-09-29 gc 
‘updated 17 October 2012;'CCLE.copynumber_byGene_ 2013-12-08 tt updated 
27 May 2014; CCLE_MUT_CNA-AMP_DELbinary_Revealergst, updated 
29 February 2016). Thetota number of point mutations and copy number changes 
‘were counted foreach cll line. Chromosome am Level evens in CCLE samples 
‘were generated sx previously described, andthe numberof arm-Levl events vss 
‘counted foreach celine. The fraction ofthe genome affected by subclona events 
‘vas estimated using ABSOLUTE". Combined CNA-SNV genomic instability 
cores were calculated as described previousy", The DNA repair gene set vss 
derived from MSigDB (hp:/software broadinstitute.orysea/msigdb), using 
the'DNA_Repair GO signature. The CIN7O gene set was detived from a pre- 
‘ious publiatian™. For each gene st, genes that were not expressed at all inthe 
(CCLE dataset weee removed, andthe remaining yene expression values were logs 
‘uansiormed and scaled by subtracting the gene expession means. The signature 
score was defined asthe um ofthese scaled gene expression values 
‘Comparison of Broad (CCLE) and Sanger (GDSC) genomic features. Whole- 
texome sequencing data for 107 matched cell lines were downloaded from the 
Sanger Institute hp /cancersangeracukicell lines, EGA accession number: 
EGADO0001001039} for the GDSC cell lines, and from the GDC portal (tps// 
portal gdccancergov/legacy-archive) for the CCLE cell lines. For copy number 
analysis, the GATK4 somatic copy umber variant pipeline was aplied (https) 
sgatkforums broadinsituteorg/gatk/discusson/914Vhow-to-callsomatic-capy- 
‘Bumber-varans-using-gatkl-cnv) "Gene level copy number calls were ene 
sted by mapping genes fom segment cals using the Consensus Coding Sequence 
Aatabase™ The gee level values were logs transformed, and converted to discrete 
‘values using predefined thresholds 0.1, +-0.3 and +0.5)-To determine the pe- 
centage of discordance foreach cell line, the numberof discordant CNA calls 
between each par af strain was divided bythe total aumber of genes excluding 
genes with a neutral copy number call in both datasets). For analysis of somatic 
‘arlants, the CCLE-Sanger merged mutation calls were downloaded from the 
CCCLE portal (htps:/portals broadinsttuteory/ele(data), and target interval ist 
‘les wete generated fr each of the 107 matched cell lines in CCLE. Mutation cll- 
ng was performed using MoTect™, with default parameters and force output” 
enabled to count the numberof reads supporting the reference and alternate alle 
foreach variant in each cline. For analysis of germline variants a commen target 
interval is fle that consisted ofa panel of 105 995 SNPs was generated, based on 
common SNVs found in 1,019 CCLE RNA-seg samples, and Mutect was applied 
‘withthe same parameters ax described above. Comparison of allelic fractions war 
performed using the subset of variants with minimum depth of coverage of 10 in 
both Sanger and CCLE datasets and with minimum of allelic fraction of 0.1 in at 
least one dataset Out ofthe 107clllines, one callie (Dov! lacked any germline 
concordance and was thus excaded from all analyses 

(Cytogenetic analysis. Karyotyping was performed by KaryoLogic (wwwkar- 
yologic.com) on 30 G-banded metaphase spreads per sample. Every spread 


splayed multiple chtomosomal rearrangements with many marker chromo. 
Somes. A marker was defined aa structuraly abnormal chromosome that cannot 
‘be unambiguously identified by conventional banding cytogenetic The analysis 
‘was performed according tothe International System for Human Cytogenetic 
"Nomenclature (SCN) 2016 guidlines. Rare metaphases with >100 chromosomes 
‘were excluded fom further analysis, 

ce-karyotyping analysis. RNA-seq daa from non-manipulated and non-treated 
‘Samples ofthe nesr-diploid human cell line RPEL were downloade from the NCBI 
SRA website (heps://wwwenebizlm ih gov/sra) STAR-pared aligner was used 
to align paired-end samples and STAR-non- paired alignes was used to align the 
‘non-paired samples, The STAR to RSEM tool was used to generate the gene- 
level expression values using the GTEx pipeline (ps: gthub.om/broadinsttute! 
‘tex pipeline) To infer arm-level copy number changes from gene expression pro- 
files, the RSEM values were analysed using the -karyotyping method", n brit, 
[RSEM values were log-coaverted, genes that were not expressed (og:RSEM < 1) 
in >20% of the samples were excluded, and expression levels ofthe remaining 
{genes were looced to RSEM-— | The median expression value ofeach gene acost 
allsampls was subtracted from the expression value of tht gene in order to obtain 
‘comparative values The 10% most variable genes were removed from the dataset 
to reduce transcriptional nose. The relative gene expression data were then sub- 
jected to a CGH-PCF analysis, witha stringent set of parameters: Least allowed 
<eviation ~0.5; Least allowed aberration size 30 Winsorie at quantile —0 001, 
Penal 1. CNAs exceeding 4% ofthe length ofa chromo- 
some arm were called arm-level CNAs 

‘Comparison of arm-level CNAs between cellline propagation and tumour 
progression, Recurrence of chromosome am level CNAs during breast cancer 
progression was determined by ther feequency in TCGA samples as previously 
{escribed™, Recurrence of chromosome arm-level CNAs ducing cell line prop- 
‘gation was determined by comparing the arm-lvel alls of the strains dtecly 
‘Separated by extensive passaging strain D versus strain L versus strain AA, strain 
'Byersus strains I and P),as shown in Extended Data Fig. 2a. Only arms that 
are recurrent gained ar lost (but not bath) in TCGA (Q = 0.03) and that have 
‘arable copy number status across the MCF? panel, were considered forthe com- 
parison, 

Statistical analysis, The significance of he dilfeence between genomic instability 
‘sssciated with diferent sources of genetic variation and diference between chro- 
‘osome numbers at twatime points af single-cll- derived clones was determined 
‘sing the two-talled Wileaxon rank-sum text, The significance ofthe diference in 
the Euclidean distance betwen compounds that work through the same Mo and 
‘compounds that work though ditirent Moa, in the discordance of non-silent 
'SNVs at different stages of transformation in chromosomal instability (CIN70) 
and weighted-genomic integrity index (wGHl) scores between cell lines derived 
{rom primary tamours and thore derived rom metastases, between the somatic 
and germline SNV Pearson correlation ofthe Broad-Sanger cell ins, inthe 
Broad Sanger somatic SNV concordance between microsatelite-stable and mucro- 
satellite unstable cel ines and between primary tumour derived and metastsis- 
<erived cel ines was determined using one-tailed Wilcoxon rank-sum test. The 
"Significance ofthe difference in mutation celular prevalence across strains was 
determined by Kruskal-Wallis text. The significance ofthe difference in AKT 
Inhibitor LV sensitivity between PTEN"” and PTEN* strains inthe relative 
{growth effect of ER depletion hetween ESR! loss and no-ESRI Ios strain peo- 
teasome activity between bortezomib-senstve and bortezomib-insensitivestrins, 
{in ER protein expression levels between strains and in the namberof ara level 
(CNAs between matched eary-late MCE strains was determined using a one-aled 
Student ts. The significance of theditference in doubling times and in sensivty 
tw oestrogen depletion was determined using two-tailed Student’ test. The 
significance of the correlation between the two replicates ofthe primary seen 
‘was determined using Pearsons correlation, The significance of the correlation 
between doubling time and the number of protein coding mutations, the corela- 
‘Hon between doubling time and the faction of subdonal mutations, the coreltion 
bberween doubling time and drug response, were determined using Spearman's 
correlation excluding the broadly resistant strains Q and M. The sigaiicance of 
the corraton between ESRI CERES dependency scores and oestrogen signaling 
snd between GATA3 CERES dependency scares and GATA3 protein expression 
levels was determined using a Spearmans correlation, The deviation of the do- 
bling-time-drug-response correlations from a hypothetical mean value of 0 vas 
<etermined using atwo-taled one-sample-test. The significance ofthe dlference 
between the emergence and disappearance of recurrent arm level CNAs daring 
cell line propagation was determited using McNemar’ test. The significance of 
the corelation between the primary and secondary drug screens was determined 
using Spearmanis corration (including only compounds that were active in 
both screens). Te significance ofthe directionality of drug-pathway association, 
and thelikelinood tht a mutation would be clonal given the numberof reads 
that dtectd it, were determined usinga binomial test. The significance between 


the fraction of pathways correct identified between the MCE panel, CTD® and 
(GSC was determined using a two-tailed Fisher’ exact test. GSEA P values and 
FDR corrected Q values are shown as provided bythe default analysis output. For 
the comparison of pathway prediction shown in Supplementary Table 32, FDR 
(Qvalues were recalculated using any the pre-selected pathways, Thresholds for 
significant assoclatons were determined ss: P< 0.05; Q< 025, The significance 
lot thedtfrence inthe karyrypic variation bebween parental andsinglecell-cone 
‘evived cultures was determined using the Levene’ test. The sigalticanceof di 

feretilly expressed genes in the single-cell RNA-seq data vss determined by an 
analysis of variance (ANOVA) followed by a Games-Howell post hoc test and a 
Bonferroni correction, Box plots show the median, 25th and 75th percentiles, lower 
whiskers show data within 25th percentile ~1.5> the interquartile range (IQR), 
"pp whiskers show data within 7th percentile +15x the IQR, and ctls show 
the actual data points, Saisticl tests were performed using the statistical sot 

ware (hp: projectory/), and the box plots and violin plots were generated 
‘sing the "boxplo’and ioplot’R packages, respectively. 

Reporting summary, Further information on experimental design is avaible in 
the Nature Reseach Reporting Samay linked to this pape. 

Code availability. The code used to generate and/or analyse the data during the 
current study are publily availabe, or avalable frm the corresponding authors 
‘upon request. 

Data availability. The datasets generted during and/or analysed during the cur- 
reat study are avaiable within the article, is Supplementary information or from 
the corresponding authors upon request. DNA sequencing data were deposited 
to SRA with BioProject accession number PRINA398960. Single-cell RNA-369 
data were deposited tothe Gene Expression Omaibus (GEO, secession number 
(GSEI14462) Source Data of ll immunostaining blots are availabe inthe online 
‘version ofthis paper. The cell divergence portals accessible at-hpscllstrainer. 
broadinsiute og 


30, Bray. MLA, Fraser AN, Hasoka, 1. & Carpenter, AE Werflow and mates or 
image quality contol in age-scale high-ontent screen. Biomol. Screen 17, 
‘ee-274 (2012), 

31, Dae, Det al CelPrafer Analyt intracve data exploration, analysis and 
‘lassfeatin of large bloga mage sts. Binformatcs 82, 2210-3212 
(e016) 

32, AdalstenseonV.A etal Scalable whole exome sequencing of el-free ONA 
ttveais high concordance wth metaeatie tumors Nat Commun. 81228 
Goin. 

33, Ha,G. eta. Integrative analysis of genome-wide loss of heterozygosity and 
‘monallaic expression at nuceotide resolution reveals rupted pathy in 
{pl negate brea concer Genome les 22 1995-2007 (2012) 

34, Sholl LM etal Institutional implementation of clinical tum profing an an 
unselected cancer population. Jc inght 87062 (2016), 

35, LH.€ Durbin, R Fast and accra sor read alignment wth Burows- 
‘Wheeler transtorm,iomarmates 25, 1754-1760 2003). 

36, MeKenna Agta The genome analjsis took: 2 MapReduce ramework fr 
Analyzing nest-generton DNA sequencing data, Genome Res 20, 1297-1303 
eo) 

7. DePrit, MA sta ramework or variation discovery and genotyping using 
ext. generaton DNA sequencing data. Nat. Ganet 4, 191-498 (2011) 

38, Giles Ket aL Sensitive detection of somatic point matatanenimpure and 
heterogeneous cancer samples. Nat Bictechnal 1, 213-219 2013) 


5 


2, 


57. 


6. 


ARTICLE 


McLaren, Wet al: Darting the consequences of genomic variants with the 
Ensambl P| and SNP effect pretctr Sionfomates 28, 2068-2070 (2010) 
(lshen, A, Venkatraman, , Lucio, F-& Water M.Crcular inary 
segmentation fr the analysis of ray-based DNA copy number data, 
ionatates 5, 597-572 (2008), 

[Xba R.Peetal. reakmer: detection of structural vation n targets massively 
paral saquencing dota using ker. Nace cis es 48, 15 (2015). 
Sanjana NE. Shale, O. Zhang. Fimroved vectors and genome-wide 
lorries or GRISPA screening. Nat Mathoas 11, 783-784 (2014). 

“Joung, J sta Genome scale ERSPR-Cac9knochout and vanserptional 
‘aetvationserening Nat Pooc. 12, 828-863 (2017). 

Johnson, W-E, LC & Rabiavi, A Adjusting batch effects in micoaray 
‘expression dala sng empincal Bayes methods. Biosiatstes 8, 118-127 
bon. 

Rees, MG. tal Correlating chemical sensitivity and basal gee expression 
reveals machanim of action. Nat Cham. Bi 12, 109-116 2016). 

(Gob, 1 Reta. Molecular classiation of eancer class iscovary and class 
prediction by gene expresion manitonng Science 286 531-537 (1998). 
Macosto,€. 2st al Highy parallel grnome-nide expression profling of 
individual calls using ranoiter droplets. Cal 161 1202-1214 2015). 

‘Tiosh Lat al Dissecting he multe lar acoryiem of metastatic melanoma 
by singlecell RNA-seq, Science 352, 18-196 (2016). 

Meyers Metal Computational correction of ony number atfecimproves 
spect of CRISPR-Casd essential screens in cancer col Nat Genet 49, 
1775-1784 2017), 

Hu, fice high-quality forceivected graph drawing Math J 10,3771 
(2008) 

Barbi 0. etal Systematic RNA terference eval hat oncogenic 

RAS crivencances require TSK. Nature 62 108-112 (2008) 

Foes, © && Malone C.A math for comparing ve hierarchical 
lusterngs-J Am. Stat Assbe 78, 553-569 (1963) 

Ber Dave, Uefa. Ptin-derved xenografs undergo mouse-specifc tumor 
‘Solution. Nat Ganet 49, 1567-1575 (2017), 

Carter, SL etal Absoite quantification of somatic DNA terationsin human 
‘cancer iat lotechnol-30,413-421 (2012) 

‘Zhang S. Yuan, .& Hao, DA genomic instabity store in lscrminating 
fnegualent Suieomes of BEAL 2 mutates and npredting outcomes of 
‘varia cancer treated with pltnum-based chemotherapy PLoS ONES, 
113169 2014). 

Subromanan, Asta Gene et enrichment analysis: a knonladgs-nosed 
‘approach for interpreting genome-wide apreseon profes Prot Nat Acad Se 
Usa 102" 15545-15850 005) 

(Carta SL Elune Kohan, Haris L.N& Szallasi Z.A signature of 
<hromosomalinstabiy inferred irom gene expression profes pret clinical 
‘utcome in multiple human cancers Nat Gant 38. 1042-1088 (2006) 
Puja Sefal Consensus cocing sequence (CCDS) database: a standardized 
setof human and mouse pota-cading regions supported by expert curation, 
cle Aes Res. 8, 0221-0228 (2018), 

Dobin, cet at STAR: ulrafstunnersa! RNA-seqaiger Sioinfomatics 28, 
15-21 G01). 

LB, Rut V, Start RM, Thomson, J A & Dewsy, CN. RNAseq gene 
‘expression estimation wit ead mapping unceralnly. antrmatcs 26, 
4492-500 2010). 

Ben-David, U. Mayshar ¥.& Benvenisty,N.Vitualkaryoypingof pluripotent 
stem cls onthe bass other global gene expression prefies. Nat Pate 
‘369-997 2013). 

ihe 1H. tan: FDistrbted Stochastic Neighbor Embecsng using 2 
‘Bano Hut mplementation hi ethub corel Rene (2015), 


1 2018 Springer Natu Limited Al rhs served 


ARTICLE 


i: A. Se 
ie = 4 
Ge 
| a reer 
eae Oy | 
formes. fi narra 
bar ite om ; 
a eee 
i) sss | 
ne ee 
re Ws 
a ih. a ips anil 
ei ee eee 
Eo eee ree 
iia yl derail 
ha ia 
man [eset 


PPO I Ot A 


Extended Data Fig. 1| Comparison of Broad and Sanger genomic 
features across 106 cell ines. a, Consparison ofthe Pearson correlations 
cof germline versus somatie SNVs across 106 paired cel lines. b, A 
histogram of the distribution of mutation discordance fractions across cell 
lines Black, the distribution of ll nn-slent SNVs: grey, the distribution 
ofthe 447 genes included in the Oncopanel. , Comparison of the fraction 
of discordant gene-Level CNAs between the Broad and the Sanger (n= 106, 
cell ines) datasets, using three diferent thresholds for CNA calling. Bar, 
‘median; box, 25th and 75th percentiles; whiskers, 1.5% IQR oflower and 
upper quartile, circles, data points d, A histogram ofthe distribution 

‘of CNA discordance fractions across cell lines. Bars are coloured asin 
b€,CNA landscapes of 11 paired cll ines. For each ell line, the CNA 
landscape of the Broad strain (top) and the Sanger strain (bottom) are 
shoven. Red, copy number gains: blue, copy numbe losses, CNAs < 10M 
in size are not presented. A histogram of the fraction ofthe genome 
affected by subclonal events across 916 cell lines from the CCLE. MCE7 
and A549 are denoted by arrows. g All CCLE cell lines ranked by their 
aneupleidy scores b, All CCLE cel lines ranked by the number of 

theie gene-level CNAs i, All CCLE cell lines ranked by the number of 
their gene-level SNVs,j All CCLE cell lines ranked by their chromosomal 
instability (CIN70) signature scores" k, All CCLE cell lines ranked by 
their DNA-repair signature scores™ 1, AI|CCLE cell lines ranked by 

thee genomie instability scores m, All CCLE cell lines ranked by their 
suibelonal genome fraction". The vertical back ine shows the rank of 
-MCE7 in cach comparison. n, Comparison of gene expression Variation 
across multiple strains of nine cell line, including MCE7. Box plots are the 
Sandard deviations ofthe expression levels for the 978 landmark genes, 
iretly measured in 1.1000. Bar, median; box, 25th and 75th percentiles; 
‘whiskers, data within 1.5% IQR oflower of upper quartile; icles, all data 
points 
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Extended Data Fig. | Schematic representation ofthe MCF7 and AS49_corresponding toa ¢, A549 strains included inthis study, their origins 
strains included inthe current study. a, MCE7 strains included inthis (columns), years of acquisition (rows), manipulations (colours) and 
‘study; their origins (columns), years of acquisition (rows), manipulations progeny relationships (lines) are shown. dA table corresponding to & 
(colours) and progeny relationships (lines) are shoven. b, A table 
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Extended Data Fig. 3 | Genetic variation across 27 MCE7 strains. 
a Variation in the copy number status of nine selected genes arose 

27 MCF7 strains. Red, copy number gains blue, copy number losses. 
“Thresholds for relative gains and losses were eta 0. and -0.1, 
respectively. b, Western blots ofthe relative protein expression levels of 
ER6 across strains, The expression of f-actn was used for normalization. 
For gel source data, see Supplementary Fig, 1. The experiment was 
repeated twice with similar results. e, Quantification of the relative 
expression of ERo, Strains Q and M were excluded from the comparison, 
Bar, median; box, 25th and 75th percentiles; whisker, data within 15> 
IQR oflower or upper quartile; icles, all data points, One-taled t-test. 
4. The allelic fractions of non-slent mutations in seven selected genes 
across 27 MCET strains, The number of non-silent point mutations 
(SNVs) across the 27 MCE7 strains. f, The number of COSMIC 
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‘non-silent point mutations shared by each number of MCE7 strains 
8. Top. unsupervised hierarchical clustering of 27 MCE7 strains, based 
‘mall oftheir SNVs, Groupe of strains expected to cluster together based 
on their evolutionary history re highlighted, asin Fig.l. Bottom, 
Corresponding heat map, showing the mutation status ofall mutations 
‘cross the 27 MCE7 strains. Mutations that were identified im ony a subset 
ofthe strains that were detected in above 5% of the reads (AF > 0.05) are 
shown. Yellow, presence of mutation; grey, absence ofa mutation. b, The 
‘number of large (>15-bp) indels and rearrangements across the 27 MCE7_ 
strains. Grey, inde; black, rearrangements, The recurrence of structural 
variants in each ofthe 42 (out of 60) genes for which atleast ane event was 
detected. The number of structural variants shared by each number of 
(MCE? strains. Note that this analysis is limited to the 60 genes listed in 
Supplementary Table 2, 
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Extended Data Fig. | Comparison of CNA landscapes between MCE7 
(CNA landscapes ofa par of MCE strains separated from each, 
other by extensive passaging.b, CNA landscapes af a par of of MCE7 
‘rains separated from each other by a genetic manipulation (introduction, 
‘of GEP reporter), CNA landscapes of 10 MCE strains separated by 
multiple freeze-thaw cycles, with litle passaging inbetween. d, CNA. 
landscapes ofa pair of MCET strains that were either cultured in it 
(Cop) or cultured in vivo and treated with tamositen (bottom). 
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landscapes ofa pair of MCE7 strains separated from each other by seven 
passages. f, CNA landscapes ofa pai of MCE7 strains before (top) and 
alter (bottom) the introduction of Cas9.g, CNA landscapes of a par of 

i obtained from four different sources. CNA landscapes of 
‘of MCE7 strains separated from each other by extensive passaging, 
Data points represent 1-Mb bins throughout the genome. Red, gains; blue, 
losses: black, normal copy aumbers; yellow, differential CNAs between the 
compared strains. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Characterization ofthe variation in SNV allelic 
fraction and cellular prevalence across 27 MCE7 strains and their 
single-cell-derived clones. a, Top, unsupervised hierarchical clustering 
‘f27 MCE7 strains, based on the allelic fractions ofall their SNVs, 
Groups of strains expected to cluster together based on their evolutionary 
history are highlighted, as in Fig. 1. Bottom, a corresponding heat map, 
showing the allelic fractions of all mutations actoss the 27 MCI 
‘Mutations that were identified in only a subset of the strains are shown, 
The presence of mutation is shown in colour according to its allelic 
fraction. b, The allelic fractions of an activating PIK3CA mutation (top) 
and an inactivating P53 mutation (bottom) across strains. Top. 
‘unsupervised hierarchical clustering of 27 MCE7 stains based on their 
SNV cellular prevalence. Groups of strains expected to claster together 
based on their evolutionary history are highlighted, asin Fig. 1 Bottom, a 
corresponding hest map, showing the celular prevalence ofall mutations 
fcross the 27 MCE? strains. Mutations that were identified in only @ 
subset of the trans are shown, The presence of a mutation is shown 
in colour according to its cellular prevalence d, The distribution of the 
‘maximal differences in cellular prevalence (CP) of non-silent mutations, 
cross 27 MCE7 strains, The peak at maximum ACP —I represents SNVs 
that are clonal in atleast one strain but are nearly ar completely absent 
{nat least one other strain the peak at maximum ACP —0 represents 
SNVs that are detected at similar prevalence across all 27 strains; and 
the peak at maximum ACP ~ 0.1 represents a group of SNVs present at 
:P ~0.1 only in strain M.e, Description ofthe MCE7 single-cell-derived 
clones included in this study, including their parental cell line, genetic 
‘manipulations and relationship to one another f, Aheat map showing the 
alleli factions of non-silent mutations in three wild-type single 


cell-derived MCE7 (scWT3-seWTS) clones and the parental population 
"The presence of a mutation is shown in colour according to its allelic 
fraction. g, heat map showing the allelic fractions of non-silent 
mutations in five genetically maniptlated single-cell-derived MCE7 
Clones. For two ofthe clones, samples were passaged for a prolonged 

time and sequenced at multiple time points, The presence of a mutation 

i shown in colour according to its allelic faction. h, Comparison 

‘of the karyotypic variation between parental and single-cell-derived 

‘ell populations, Histograms show the distribution of chromosome 
‘humbers from the parental (light grey) and single-cell-derived (dark 

‘srey) populations, P values indicate the significance ofthe diferences 
between the variations (rather than the means) of the populations 
using a one-tailed Levene’ test (50 metaphases per group). i, To 
representative karyotypes ofeach sample, Note that ll single-cell-derived 
clones ar karyotipically heterogencos. Marker chromosomes are not 
‘shown. Arrows point to partially aberrant chromosomes. Images are 
representative of 50 metaphases counted per sample. j, Two representative 
[karyotypes from twa cell populations ofthe same single-cell-derived clone, 
separated by six months of culture propagation, Marker chromosomes are 
ht shown. Arrows point to partially aberrant cheomosomes, Images are 
"representative of 80 metaphases counted per sample. k; Comparison of the 
karyotypic variation between two cell populations af the same single- 
cell-derived clone, separated by si months of culture propagation. 
Histograms show the distribution of chromosome numbers from the eatly 
(light grey) and late (dark grey) populations. Per sample, 50 metaphases 
‘were counted. The P value indicates the significance af the difference 
between the means of the populations using a two-tailed Wilcoxon rank- 
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Extended Data Fig. 6 | Transcriptomic variation across 27 MCE7 
strains and ther single-cell-derived clones. a, Comparison of the 
1L1000-based MCE7 expression profiles to microarray-based expression 
profiles from CCLE, Histograms show the distributions ofthe Spearman 
Correlations between the 27 MCE7 strains and either MCE7 (light purple), 
two MCE7 derivatives (dark purple and blue), other breast cancer cell, 
lines (green) or non-breast cancer cell lines (grey) The comparison is, 
based an the 978 landmark genes directly measured in L1000.b, The 
number of differentially expressed genes identified in all possible pairwise 
comparisons of MCF7 strains, using a twofold change cutoff. LEC, log 
fold change; DEGs, differentially expressed genes. c, The 10 top hallmark 
gene sets identified by GSEA to be significantly enriched 
{genes that are most differentially expressed across the M 
The two gene set related to oestrogen response are highlighted in red. 
4. Comparison of gene expression Variation within and between stains 
Histograms show the distributions of gene expression variation within 
replicates ofthe same strain (grey) between closely related strains 
(purple) and between all strain (green). The comparison is based on the 
978 landmark genes directly measured in 1.1000. e, Heat map showing 
the arm-level CNA profiles of 27 MCE strains. Red, gains blue, losses. 
£, GSEA reveals downregulation of the genes on chromosomes 10g, 174 
1d 21q in strains that have lost copies of these arms, and upregulation 
fof the genes on chromosames 59, 6p, Liq and L6pin strains that have 
gained copies of these arms. g, GSEA ofthe upregulation of mTOR 
signalling (gene set: hallmark MTORC1_ signalling) and of genes that are 
‘upregulated when PTEN is knocked dowa (gene set: PTEN_DN.x2_UP) 
in strains that have gained PIK3Cat downregulation ofthe oestrogen 
response signature (gene set: hallmark_oestrogen_response_late) in 
strains that have lost ESRI: cell cycle signature (gene set: KEGG.cell_ 
cycle) in strains that ave lost CDKN2A; and downregulation of KRAS 
signalling (gene set: hallmark KRAS_ signalling DN) in strains that have 
lost MaAP2K4. b, GSEA ofthe upregulation of mTOR signalling (gene 
set hallmark MTORCI_ signalling) in strains with high prevalence of an 
sctivating PIK3CA mutation; upregulation of genes that are upregulated 
when PTEN is knocked dovn (gene set PTEN_DN.v1_UP) in strains that 


have an inactivating PTEN mutation; and downregulation of genes th 
are downregulated when TP53 is knocked down (gene set: PS3_DN.vl_ 
DOWN) in strains with high cellular prevalence of an inactivating P53 
mutation. i, GSEA reveals upregulation of mTOR signalling (gene sets: 
‘MTOR_UPN&.V1_UP and hallmark_MTORCI_ signalling) in strains that 
hhave both PTEN copy number loss and an inactivating PTEN mutation 

j, AL-SNE plot of single-cell RNA-seq data from MCE7-AA cell trested 
‘with bortezomib (500 nM) at different time points. Each dot represents 
single cll, and cells ae coloured by time point. k, Comparison of the 
proteasome gene expression signature across time points, , Comparison 
‘fhe unfolded protein response gene expression signature across time 
points. m, Comparison of two proliferation gene expression signatures, 

S (left) and G2M (right) across time points. n, Comparison ofthe early 
(left) and late (right) response to oestrogen gene expression signatures 
across time points. Red lines denote mean values, P values indicate 
Significance from a one-way ANOVA followed by a Games-Howell pst 
Inge test. n= 1,726, 2,743, 1.851 and 1,235 cells forty ts and fy 
respectively. o, A SNE plot of single-cell RNA-seq data from a parental 
population and its single-cell-derived clone at to time points, Each dot 
represents single cell and cells are coloured by sample. p, Comparison 
of the transcriptional heterogeneity between a parental MCE7 population 
and its single-cell-derived clones. — 2,904, 2,990, 3,896 and 4,583 cells 
for parental, seWT3, scWT and scWT, respectively. q, Comparison of 
the transcriptional heterogeneity between two cultures of the same single- 
cell clone, separated by sis months of continuous passaging. n 4.295 and 
4116 cell, fr clone9-May17 and clone9-Nov17, respectively Box plots 
show the Euclidean distance between the ces each cll population, 

Bar, median; box, 25th and 75th percentiles; whiskers, data within 15> 
1QK oflower or upper quartile P values indicat significance froma 
‘one-way ANOVA followed by « Games-Howell posthoc test. , The 10 
top hallmark gene sets identified by GSEA tobe significantly enriched 
among the top differentially expressed genes between the two cultures of 
clone MCE7_GREBI_9 (May 2017 versus November 2017). The gene sets 
Felated to oestrogen response are highlighted in red, and those related to 
proliferation are highlighted in green, 
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Extended Data Fig. 7 | See nest page for caption. 
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Extended Data Fig.7 | Extensive genetic and transcriptional variation. 
across 23 strains of A549, a, Top, unsupervised hierarchical cistering of 
25 A349 strains, hased on their non-silent SNV profiles derived from deep 
targeted sequertcing Strains expected to cluster together based on their 
evolutionary history are highlighted in blue. Bottom, «corresponding 
heat map, showing the mutation status of non-slent mutations across 
the 23 AS49 strains. Mutations that were identified in only a subset of 
the strains, which were detected in above 5% af the reads (AF > 0.05) are 
shown. The presence ofa mutation is shown in yellow, and its absence in 
sgrey-b, The number of non-silent point mutations shared by each number 
‘OF AS strains. Top, unsupervised hierarchical clustering of 23 A349 
strains, hased on the allelic fractions oftheir non-silent SNVs. Bottom, 
‘corresponding heat map, shoving the allelic Factions of noa-silent 
‘mutations across the 23 AS9 strains. Mutations that were identified in 
‘nly a subset ofthe strains are shown. The presence of a mutation is shown 
in colour according to it allelic fraction. d, The allelic fractions of non 
silent mutations in six selected genes across 23 A549 strains. Noe the 
ctvating frameshift mutation in SMARCAM, one of the most frequently 
‘mutated genes in lang adenocarcinoma", which was detected at an allelic 
faction of ~1 in 9 ofthe strains, But was not detected at all inthe other 
strains, e, The numberof gene-level CNAs shared by each number of 
MET strains, Red, copy number guns; blue, copy number losses, | CNA 
‘aration in the copy number of CDKN2A. Red, copy number gains, blue, 
copy number losses, Thresholds for relative gains and losses were set at 0. 
and —0.1, respectively g, Unsupervised hierarchical clustering of 23 A349 
Strains, sed on their global gene expression profiles. Strains expected 
to clister together based on their evolutionary history are highlighted 
in blue. h, A -SNE plot of L1000-based gene expression profiles from 


‘multiple samples of nine cancer cell ins, The asterisk denotes the 
25 A549 strains profiled inthe current study j, Comparison betveen 

the L1000-based A349 expression profiles andthe microarray based 
‘expression profiles from CCLE. Histograms show the distributions ofthe 
Spearman correlations between the 23 A549 strains and A549 (light blue), 
other non-small-cell ung cancer cell ines (purple), other lung cancer 

cell lines (green) or non-lung cancer cell lines (grey). The comparison 
isbased on the 978 landmark genes directly measured in L1000.j, The 
numberof differentially expressed genes identified in all possible pairwise 
‘comparisons of A549 strains, using a twofold change cutoff, Arm-level 
{ins ae associated with significant upregulation and arm-level losses 

fare associated with significant downregulation of genes transcribed. 

from the aberrant arms. GSEA showing upregelation ofthe genes on 
chromosome 2q in strains that have gained a copy ofthat arm (left), and 
‘downregulation af the genes on chromosome 9q in strains that have lst a 
‘copy of that arm (right) 1, Gene-level CNAs are associated with significant 
‘dysregulation ofthe perturbed pathvsays. GSEA reveals upregulation of 
the genes that are upregulated, and downregulation ofthe genes that are 
downregulated, when 1P53 is knocked down in strains with MDM2 high- 
level copy number gain; and upregulation or downregulation of the G2/M 
cell cycle checkpoint signature in strains with CDKN2A copy number 

Toss of CCNDI copy number gan. m, Point mutations ae associated 
significant dysregulation ofthe perturbed pathways. For example, GSEA 
reveals donenregulation of two PRC2-related expression signatures in 
strains with an inactivating SMARCA4,n, The 10 top gene sets identified 
bby GSEA to be significantly enriched among the LO genes tht are most 
dlfferentially expressed across the AS49 strains. The six gene sets related to 
KRAS signalling ar highlighted in ced 
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Extended Data Fig. f | See nest page for caption. 
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Extended Data Fig. 8 | Genetic variation across multiple strains of 
additional cancer and non-cancer cell lines. , The fraction of non 
Silent SNVs that are discordant between pais f strains af the same 
cellline. Data are mean + sem, n, numberof strain pairs compared. 

b, Arm-level CNAS aris in RPE] samples. Pots show CNAs detected by 
an e-karyotyping analysis of 26 RPE! samples. Red, gains blue, losses 


«, Comparison of variability in non-silent SNVs between non- transformed, 


partially transformed and fully transformed MCFIOA samples. Box 
plots show the fraction of discordant non-silent SNVs betvieen pairs of 
Samples within each category. Ba, median; box, 25th and 75th percentiles, 
‘whiskers, data within 1.5 IQR oflower or upper quartile; circles, all 

data points. One-tiled Wilcoxon rank-sum test, n=28, 112 and 14 

strain pales, for the non-transformed, partially transformed and the fully 
transformed groups, respectively. d Comparison of the Broad-Sanger 
allelic faction correlations of cel lines derived from primary tumours 
and those derived ftom metastases. Bar, median, coloured rectangle, 

25th and 75th percentiles; width ofthe violin indicates frequency at 

that value. One-taled Wilcoxon rank-sum test. e, Top, comparison of 
the chromosomal instability (CIN70) gene expression signature score 


between CCLE lines derived from primary tumours and those derived 
from metastases. Bottom, comparison ofthe weighted-genomic integrity 
index (wGll) between CCLE lines derived from primary tumours and 
those derived from metastases. ar, median; coloured rectangle, 25th 
and 75th percentiles; width ofthe violin indicates frequency t that value 
(One-tailed Wilcoxon rank-sum test f Comparison of the Broad-Sanger 
allelic fraction correlations of micrasatellite-stable cell lines (MSS) and 
microsatellte-unstable cell lines (MSI) Bar, median; bos, 25th and 75th 
percentiles; whiskers, data within 1.5x IQR oflower or upper quartile 
‘ircles all datapoints. One-tailed Wilcoxon rank-sum test. g Heat maps 
Show the allelic fractions of non-silent mutations in multiple strains of 
cancer cell lines. The presence of « mutation is shoven in colour according 
toits allelic fraction. h, Heat maps show the allelic fractions of non-silent 
‘mutations in multiple strains of the non-cancer cell lines HAE and 
MCFIOA. The presence of mutation is shown in colour according to 

its allelic fraction. Also shown i an unsupervised hierarchical clustering 
ofthe 15 MCFIOA strains, which represent different degrees of cellular 
transformation, based on their nn-silent mutation profiles, 
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Extended Data Fig.9 | Characterization of cell proliferation and 
morphology across 27 MCE7 strains. a, Growth response curves of 27 
MCE7 strains, based on microscopy imaging. b, Doubling time of the 27 
MCE? strains, as measured by automatic microscopy imaging. Variation 
in celular radive across the 27 MCF7 strains. d, Variation in form factor, a 
measure of circularity, across the 27 MCE7 strains. ¢, Variation in nuclear 
radius across the 27 MCE7 strains a-e, Data are mean sd, circles show 
Individual values; n= 3 replicate wells per data point. , Microscopy 
Imaging ofthe 27 MCE7 strains, showing the morphological differences 
between them. Scale bar, 300jim. Lmages are representative of five 
replicate wells perstrain.g, Unsupervised hierarchical clustering of 27 


[MCE strains, based om 1,784 morphological features, The correlation 
between proliferation rate (shown as doubling time) and the number 

‘of non-silent protein-coding mutations, across 18 naturally occurring 

[MCE strains (that is, strains that have not undergone drug selection or 
‘genetic manipulation). Spearman’ p and P values indicate the strength 
tnd significance of the correlation, respectively i, The correlation between 
proliferation rte (shown as doubling time) and the fraction of subclonal 
‘mutations, across 18 naturally occurring MCE7 strains. Spearman's 

‘pnd P values indicate the strength and significance ofthe correlation, 
respectively, 


V8 Springer Nature Lire Alsights reserve 


ARTICLE 


Extended Data Fig. 10| Characterization of drug-response variation, 
across 27 MCF7 strains a, Unsupervised hierarchical clustering of 
27 MCFT strains, based on their response to all 321 compounds in 
the primary screen, Groups of strains expected to cluster together 
based on their evolutionary history are highlighted, asin Fig. 1b, Pie 
chart of the clasification ofthe screened compounds based on thet 
diferental activity. The response to each active compound was defined 
1s ‘consistent if viability change was < —S0% for all trains, ‘variable 
if viability change was «50% for some strains and > ~20% for other 
strains, of ‘intermediate’ if viability change was in between these vale. 
Classification was performed using a two-strain threshold , Pie charts as 
bexcluding strains Q and M that were generally more drug resistant, 
Classification was performed using a one:strain of a two-strain threshold 
(left and right charts, respectively) d, Pie charts asin b using an activity 
threshold of viability change < 80%. Classification was performed using 
1s one-strain threshold, either including all stains (eft) or excluding 
strains Q and M (right), The numberof gene-level CNAs shared by each 
number of MCP? strains. Red, copy aumber gains; blue copy number 
losses. f, The number of non-silent point mutations shared by each 
number of MCE7 strains. The 10 naturally occurring connectivity map 
strains were averaged and considered as single sample. g, The correlation 
between proliferation rate (shown as doubling time) and the number of| 
‘non silent protein-cading mutations, cross naturally accurring MCE7 
strains (x= 10) Spearman’ p and P values indicate the strength and 
significance of the correlation, respectively: The 10 naturally occurring 
(CMap strains were averaged and considered as single sample. h, The 
correlation between proliferation rate (shoven as doubling time) and 
the fraction of sulbclonal mutations, across naturally occurring M 
strains x= 10) Spearmansp and P values indicate the strength and 
significance ofthe correlation, respectively: The 10 naturally occ 
(CMap strains were averaged and considered a a single sample. 
number of differentially expressed genes identified in all possible pairwise 
‘comparisons of MCF strains, using a twofold change cutoff. The 10 
naturally occurring CMap strains were averaged and considered as single 
sample. j, Pie charts ofthe classification ofthe screened compounds based 
fn their differential activity. The response to each active compound was 
defined as consistent if viability change was < 50% forall stains, variable 
if viability change was < 50% for some strains and > ~20% for other 
strains, o intermediate if viability change was in between these values 
Classification was performed using « one-strain or a two-strai resistance 
threshold (top and bottom charts, respectively). The 10 naturally 
‘occurring CMap strains were averaged and considered ae single sample 


k, The dose-response curves for ten compounds ae shown. For each 
compound, eight concentrations were tested in each strain. Two sensitive 
Strins and to insensitive strains are plotted. Each data point represents 
the mean of two replicates. Nutlin-3, a compound that had no toxicity 
against any ofthe strains in the primary screen, was included as negative 
cantrol. Romidepsin, acompound that killed al strains very efficiently in 
the primary screen yas inchided as positive control and turned out to be 
diferentialy active at lower concentrations, The Pearsons correlation of 
the two compound sereen replicates across the MCE strains, m, Strains 
more sensiive to proteasome inhibitors exhibit higher proteasome 
activity. The chymotzypsin-lke activity ofthe proteasome was measured 
in three sensitive and Une insensitive strains, Data are mean + +4, one: 
tailed t-test, n=4 replicate wells, Western blots ofthe relative protein 
‘expression levels ofthe proteasome 19S complex members PSMC2 and. 
PSMD1 in three sensitive and thre insensitive strains, The expression of 
‘9-tubulin was used for normalization. The experiment was repeated once, 
‘with n=3 strains per group. For gel source dat, sce Supplementary Fig. 1 
‘0, Quantification ofthe relative expression of PSMC2 and PSMDI. Data 
aremean + sd. one-tailed -test,n=3 strains per group. p, Upregulation 
ff the KEGG cell cycle signature in strains sensitive tothe cell cycle 
inhibitor CDK/CRK inhibitor (n=3) compared to insensitive strains 

(n= 12). 4, Upregulation of mTOR signalling in strains sensitive to the 
PISK inhibitor PP-121 (n=11) compared to insensitive strains (=), 

+, Descnregulation of the genes that are downregulated when ALK is 
[knocked down in strains sensitive tothe ALK inhibitor TAE-G84 (n —4) 
compared to insensitive strains (1 =15)-, Upregulation of IL-6-AK- 
‘STATS signalling in strains sensitive to the STAT inhibitor nifuroxazide 

(n=9) compared to insensitive strains (n=6)t, Upregulation of the 
‘genes that are upregulated when AKT is overexpressed in strains sensitive 
tothe AKT inhibitor tricsibine (n= 2) compared to insensitive strains 
(=8)-u, Upregulation of hypoxia signalling in strains sensitive tothe 
HSP inhibitor 17-AAG (n=3) compared to insensitive strains (n= 15) 
vy, Downregulation of xenabiotic metabolism signatures in strains M and 
Qn =2), which exhibited an increased resistance to most compounds 
compared tothe other strains (25). w, Upregulation ofthe early and 
late oestrogen respanse signatures, in strains most sensitive to the ER 
inhibitor tamoxifen (n= 5) compared to the least sensitive strains ( 

+x, Sensitivity to oestrogen depletion and to tamoxifen is associated with 
the copy number status of ESRI. Heat maps eepresent the relative viability 
inoestrogen-depleted medium (top) and in response to tamoxifen ( 
166M: bottom). 


?. 
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Extended Data Fig. 11 | Comparison of genetic. transcriptomic- and 
ddrug-response-based clustering trees, genomic distances and CRISPR 
dependencies. a, Comparison of clustering trees using the Fowlkes 
allows approach. The dendrograms were based on SNVs, gene-level 
CNAs, arnt level CNAs, gene expression profiles and drug response 
patterns and were all compared to each other. The Fowlkes-Mallows 
{index (Bk) was computed forall potential numbers of clusters (k values) 
ranging from 5 to 26. The re lines indiate the abserved Bk values, 
‘whereas the grey lines represent the 95% upper quantile of the randomized 
{istribution, The maximum Bk value epresents the degree of similarity 
between the compared pair of dendrograms. The grey shading represents 
the difference between the observed Bk values and those ofthe 95% 

‘upper quantile ofthe randomized distribution. b, Force-directed layout 
‘of screened lines using a similarity matric determined by the probability 
of cell lines clustering together in dependency space. Cell lines (nodes) 

are coloured by lineage. c, Left, the averiap of dependencies in KPL1 and 
(MCE? using corrected CERES scores, with genes shoving depletion effects 
Inall cll ines (that is, pan-esentil genes) excluded. The threshold 

for dependency was set asa CERES score <—0.5, Right, overlap in 
dependency with genes of indeterminate dependency status (CERES 
scores between ~ 04 and 0.6) in cither cell line excluded. d, A two- 
sample GSEA of MCE7 and KPLI against the oestrogen responce gene 
sets (n= I sample per group). Expression of the oestrogen signalling 
pathway is strongly enriched in MCF7. e, The correlation between ESRI 
{dependency values and the single-sample GSEA enrichment scores of the 
estrogen response hallmark gene sets (n =27 cell lines). The dilference 
in oestogen response signalling between MCF7 and KPLI predicts their 
fering levels of dependency on ESRI. f The cortelation between GATAS 
{dependency and GATAS protein levels (z-scored values for reverse-phase 
protein arrays: n=27 cell lines). The dilference in GATA3 protein levels 
between MCEY and KPL predicts ther differing levels of dependency on 


GATA3, Spearman's p and P values indicat the strength and significance 
ofthe correlations, respectively, Top, comparison of proliferation rates 
between a parental MCE? population and its single-cell-derived clones 
Bottom, comparison of proliferation rates between two cultures of the 
same single-cell clone, separated by six months of continuous passaging. 
Box plots show the population doubling time ofeach sample. Bar, median; 
box, 25th and 75th percentiles; whiskers data within 15» IQR of ower 
dr upper quartile circles, all data points. Two-taled t-test n, replicate 
welsh, Top, comparison of the sensitivity to oestrogen depletion between 
a parenial MCF7 population and its single-cell-derived cles. Bottom, 
‘comparison ofthe sensitivity to oestrogen depletion between two cultures 
tf the same single-cell lone, separated by six months of continuous 
passaging. Box plots show the relative grovth rate in oestrogen-depleted 
‘medium. Bar, median: box, 25th and 75th percentiles: whiskers, data 
Within 15% IQR of ower or upper quartile; circles, all data points. Two- 
tailed t-test, replicate wells i, The correlation beticen sensitivity to 
tamoxifen (relative viability at 204M) andthe sensitivity to oestrogen 
depletion (relative growth rate), across the parental MCE7 populations and 
their single-cell clones (n =7). Spearman's p value and P values indica 

the strength and significance ofthe correlation, respectively. 

plots between various measures to estimate cll ine strains (n— 31 strain 
pairs). CNA distances (based on ultra-low-pass whole-genome sequencing 
or targeted sequencing), SNV distances, ene expression distances and 
drug response distances were compared to each other. CNA distance based 
fn ultra-low-pass whole-genome DNA-sequencing was determined by- 
the fraction ofthe genome affected by discordant CNA calls, CNA and 
SNV distances based an targeted sequencing were determined by Jaccard 
indices. Gene expression and drug-response distances were determined by 
Euclidean distances. Spearman's p and P values indicate the strength and 
‘Significance ofthe correlation, respectively 
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Extended Data Table 1 | Implications ofthis study for the use of cel ines in cancer research 


Findings 


Trnplications 


Recommendations 


Given two sans, 20% oF tations 
‘would be observed in only one of them 


Prolonged passing introduces more 
variation dhan mule freeze-thaw 
cycles 


‘Various genomic, transeriptomic and 
‘phenotypic assays yield highly simile 
Slistermg res 


(Genetic manipulations that are 
considered “ete” can introduce 
senetic variation 


netic and transcriptomie variation 
say affect drug response 


Pre-existing heterogeneity within 
culture underlies cel line instability 


Heterogencity keeps emerging in 
culture due ongoing genomic 
instability 


Tere is 10% Hieltnood hat a mation 
observed in a strain would net appear in & 

<atabaseof eel line genomic features 

For most ell ines, feezing and thawing is 
likely to be associated with fewer changes 
‘than maintaining in eultre 


‘Simple ad inexpensive genome-wide assays 
can serve a a proxy for diversification 


Cell lnes with luorescent reporters, DNA 
barcodes oF Cas9 expression are not intial 
to their parental cell lines, 


Inconsistencies in drug response studies may 
be atebuted to genetic and transeriptomic 
variability 


Transcriptional differences between sensitive 
and resistant strains can elucidate compound 
‘mechanism of ation 

Single cell-derived clones differ fom one 
another genetically, tanscripticnally and 
henotypically 


Subule diferences in culture conditions ca 
lead to changes in ell line clonal 
ccamposition| 

Prolonged passaging of single cell-derived 
clones ean lead to thie diversification 

Cell ines with deficient maintenance of 
‘genome itearity (eg, MSI or TPS3-mutant) 
are more prone fo genomic evolution 


Be cautious when using published datasets of 
genomic features as “lookup tales” 


Keep tick of passage number 

© Use passage-matched controls 

For large-scale sereens, prepare mulipe frozen 
vals for downstream analyses 

Use inexpensive genome-wide assays (ex, LP- 
|WGS) and compare to published relerences 
using Cell STRAINER: 
Inips/cllstainer hroadinstutsong 

© Exclude srains that show exireme 
Aivesitication 

+ Use efficient infection methods to reduce the 
bottlencek associated with antbiosie selection 

‘Characterize manipulated strains to ensure they 
retain hallmark genomic Features 

© InCRISPR sercens, eoret for copy number 
effects using the eopy number Iandseape ofthe 
Sereened stain 

+ Genetic and tanseriptomic distances should be 
considered when comparing drug response dat 

Compare drug response data to genomic data 
fom the same stain 

Use characterized isogenietke stains to 
uncover associations between molecular features 
and drug response 

Contin the genomic features of single cell- 
derived clones 

+ Avoid comparisons between bottenecked cll 
populations, whenever possible 

+ Keep culture eoditons constant 


+ Re-confirm genomic features of single cll- 
derived clones following prolonged passaging 

Apply these recommendations more stringently 
{o genomically unstable eel fines 
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Creating a functional single- 
chromosome yeast 


Yangyang Shao!2, Ning Lu, Zhenfang Wu', Chen Cai", Shanshan Wang", Ling-Li Zhang?*, Fan Zhout, Shijun Xiaot, Lin Liu‘, 
Xiaofei Zeng", Huajun Zheng’, Chen Yang’, Zhihu Zhao®, Guoping Zhao!™*, Jin-Qiu Zhou", Xiaoli Xue'* & Zhongjun Qin'* 


Eukaryotic genomes are generally organized in multiple chromosomes. Here we have created a functional single- 
chromosome yeast from a Saccharomyces cerevisiae haploid cell containing sixteen linear chromosomes, by successive 
end-to-end chromosome fusions and centromere deletions. The fusion of sixteen native linear chromosomes into asingle 
chromosome results in marked changes to the global three-dimensional structure of the chromosome due to the loss of 
all centromere-associated inter-chromosomal interactions, most telomere-associated inter-chromosomal interactions 
and 67.4% of intrachromosomal interactions. However, the single-chromosome and wild-type yeast cells have nearly 
identical transcriptome and similar phenome profiles. The giant single chromosome can support cell life, although this, 
strain shows reduced growth across environments, competitiveness, gamete production and viability. This synthetic 
biology study demonstrates an approach to exploration of eukaryote evolution with respect to chromosome structure 


and funetion. 


Almost all known natural eukaryotic species have multiple chromo- 
somes, except for the male ant Myrmecia pilosula, which contains only 
one chromosome! In addition, the numberof chromosomes in eukary- 

otic species varies without a clear association with their biological cha. 

acteristics, For instance, in mammals, human (Homo sapiens) diploid 
cells have forty-six chromosomes, whereas diploid cells ofthe Indian 
‘muntjac (Muntiacus muntjak) have the lowest number of chromosomes 
(sixfor the female and seven for the male) In fungi, haploid cll of the 
budding yeast Saccharomyces cerevisiae have sixteen chromosomes and 
a genome of approximately 12 Mb’, whereas haploid cell ofthe fission 
yeast Schizosaccharomyces pombe have only three chromosomes and a 
genome of approximately 14 Mb’. The advantages to a eukaryotic cell 
of multiple chromosomes instead ofa single one are not cleat In this 
stud, we have reorganized the genome of the unicellular eukaryotic 
model organism S. cerevisiae, whose haploid cell contains sixteen chro 

_mosomes ranging from 230 to 1,500 kb into one giant chromosome, 
in der to explore whether a yeast cll with an artificially fused single 
chromosome can survive and complete a sexual cycle. 


Rationale 

“The creation of asingle-chromosome yeast from S. cerevisiae BY4742 
haploid cells equired 15 rounds of chromosome end-to-end fusions, 
with deletion of 15 centromeres and 30 telomeres (Fig. 1a, Extended 
Data ‘Table 1). During the fusion process, the following criteria and 
principles were followed. (1) To generate genetically stable fused 
chromosomes and avoid the formation of dicentric chromosomes’, 
simultaneous deletions of one centromere and two telomeres in each 
ound of fusion were requited, We developed a method to precisely fuse 
two chromosomes by using both the efficient CRISPR-Cas9 cleavage 
system’ and the robust homologous recombination activity of yeast 
(Fig. 1b). (2) The single centromere was intentionally kept roughly in 
the middle ofthe final single chromosome to maintain two arms with 


balanced lengths. (3) The order of chromosome fusion was randomly 
selected, Our pilot experiments showed that eight pairs of randomly 
selected chromosomes could all be successfully used, and the result 
Ing strains grew as robustly as the wild-type strain, indicating that the 
yeast cells could tolerate random fusion of two chromosomes, (4) The 
deleted regions of each centromere and telomere were carefully elected 
toavoid affecting adjacent genes. (5) Inaddition, the redundant copies 
of telomere-associated long repetitive sequences (over 2 kb Extended 
Data Table 2) located on different chromosomes were deleted to avoid 
potential homologous recombination at undesired sites, 


Chromosome fusion and confirmation 

The frst chromosome fusion strain, SY0, was constructed by simulta 

neously removing the telomere and telomere-associated long repetitive 
sequences ofthe right arm of chromosome VII (VIR) and the left arm 
ofchromosome VIII (VIIIL),as wellas the centromere element in chro: 

‘mosome VIII (Fig. 1b), Following the same pair-wise fusion strategy, 
fourteen successive rounds of chromosome fusion were carried out in 
strain SYO, and finally strain SY14, with al sixteen chromosomes fused 
into one giant single linear chromosome, was successfully constructed 
(Extended Data Table 1). For each round of chromosome fusion, the 
positive ates confirmed by PCR sequencing ranged from 20 to 100% 
(Extended Data Table 1) 

To validate the series of chromosome fusions in strains SYO-SY 14, 
\we examined the chromosome numbers using pulsed-field gel electro 
phoresis (PFGE) under various optimal conditions (Fig. 22). With the 
accumulation of chromosome fusions, the DNA bands corresponding 
to the native chromosomes disappeared accordingly in the lower parts 
ofthe gels and the DNA bands corresponding to the newly fused, larger 
chromosomes (indicated with red arrowheads) appeared in the upper 
parts of the gels. The single linear chromosome of SY 14 (11.8 Mb) 
migrated most slowly and remained at the top of the gel (Fig. 2a) 
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Fig. 1 | Creation ofa single-chromosome yeast. a Sisteen native 
chromosomes (I-XV1) of BY4742 (wildtype) are aligned in the outer ring. 
The single chromoxome of SY aligned in the inner ring has undergone 
fifteen sequential rounds of chromosomal end-to-end fusions, indicated by 
dashed lines. b, CRISPR-Cas9-mediated fusion of chromosomes VIL and 
VII. Cas9 nuclease cut at the telomere (sites S1 and S2) and centromere 
(site 83) loc with the guidance of gRNAs 1-3. The broken chromosomes 
‘were repaired through homologose recombination with the provided DNA 
targeting cassettes, The curation of URA3 and the guide RNA expression 
plasmid (pgRNA) occurred simultaneously upon galactose induction 


‘The sizes of all chromosomal DNA bands were in agreement with 
the theoretical calculated sizes (Extended Data Table 1). In addition 
to PEGE, we performed Southern hybridization with a specific 
telomeric DNA probe to further confirm the proper chromosome 
fusions Following each round of chromosome fusions, the correspond: 
ing telomere signals disappeared owing to deletion ofthese telomere 
sequences (Fig. 2b, Extended Data Fig, 1). The reduction in telomere 
numbers in the SY14 strain was visualized by immunostaining of 
the 13-myc-tagged telomere-binding protein Sir2 using anti-mye 
antibody’. In BY4742 cells, 32 telomeres clustered at 6-8 foci in the 
‘nuclei (Fig 2c), consistent with previous reports". Only one oF two 
telomere signals were detected in $Y14 ces. Notably, nucleolarstuctares 
in the SY14 cells, which could be seen by Nop! staining (Fig. 2), 
remained intact, suggesting that chromosome fusions caused litle 
change inthe chromatin region of ribosomal DNA (rDNA) loc 

"The genomic DNA of the SYA strain and its parental strain BY4742 
‘were completely sequenced by a combination of Packio and lumina 
sequencing with 426- and 320-fold coverage, respectively. Their chro- 
‘mosomal nucleotide sequences were de novo assembled into 1 and 
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Fig.2 sceneries Confirmation of chromosome fusion(s) in yeast strains. a Intact 
chromosomal DNA analysis by PEGE under different conditions. The 
zed arrows indicate the newly Fused chromosomes in each strain, Data 
shown are representative images of three independent experiment. , The 
‘xhol-digested genomic DNAs from the strains were Southern blotted with 
atelomere-specfic Tx probe. The Xelement-and X-Y'-containing 
telomeres are indicated in blue and red, respectively in parentheses, Results 
are representative of two independent experiments. The myc-tagged 
telomere binding protein Si? was detected with polyclonal anti-mye antibody 
conjugated (ed) secondary antibody. Nopl, a nucleolar protein, was 
detected with a monoclonal anti-Nopl antibody and Alexa 488-conjugated 
(green) secondary antibody. DNA was tained by DAPI (blue). Seale bar, 
1am, Data represent mean sd. (10 sections per genotype from three 
independent experiments). d, Comparison ofthe genome sequences of 
‘BY 4742 (right) and SY14 (lef). MT, mitochondrial genome. 


16 contigs, respectively. The approximately 1.5-Mb array of 1DNA 
repeats (approximately 9.1 kb)" was difficult to assemble, and only 
several copies of DNA repeats were assembled in both the BY4742 
and SY14 genomes. The complete nucleotide sequences ofthe single 
chromosome of SY14 were compared with the siteen chromosomes 
of BY4749, which showed excellent co-linearity (Fig. 24), confirming 
that the chromosome orderings and orientations were as designed. 
Sequence alignments of the BY4742 and SY14 chromosomes revealed 
that all 61 designed deletions (Extended Data Table 1) had been 
successfully achieved (Fig, 2d, Extended Data Fig, 2); however, 
1 single-nucleotide polymorphisms (SNPs) and 7 insertions and 
deletions (indels that arose during chromosome fusions were detected 
using next-generation sequencing and validated by the Sanger method 
(Extended Data Table 3) 


Chromosomal 3D structures 
Previous studies have documented the higher-order folding and spatial 
architecture ofall sixteen chromosomes in S, cerevisae cells! We 
carried out chromosome conformation capture (3C)-derived Hi-C. 
assays on BY4742 and SY14 cells and on two intermediate strains— 
SY6 (Containing nine chromsomes: seven fused and two native) and 
SY13 (containing only two fused large chromosomes) (Extended Data 
‘Table 1). The global chromosome interactions and average chromo: 

some 3D architecture ofthe different fusion cells were analysed and 
compared with those of BY4742 cells!“ 

‘The contact maps of BY4742, S¥6, SY13 and SY14 cells clearly 
showed sixteen, nine, two and one independent, distinct intra- 
chromosomal interaction large square lattice structures, corresponding 
to the sixteen, nine, two and one individual chromosomes, respectively 


(ig. 38), The Z-score difference contact maps showed thatthe strong 
centromere-centromere interactions gradually disappeared along 
With the loss of corresponding centromeres (blue dots), but the inter- 
actions among the retained centromeres became much stronger 
(ced dots) (Extended Data Fig 3).Consistent with a previous report, 
the sixteen chromosomes of BY4742 cells and the nine chromosomes 
of S¥6 cells showed atypical Rabl configuration" centromeres clus 
tered around the spindle pole body, telomeres clustered with the nuclear 
envelope, and chromosome arms extending between these two anchor 
ing points (Fig 3b). Owing to the marked reduction in centzomere and 
telomere numbers, the overall genome structures in SY13 and SY14 
cells exhibited relatively twisted, globular configuration with both 
the centromeres and telomeres located roughly on the periphery of 
the whole structure and the two arms of each chromosome much more 
bent than in BY4742 of SY6 cells, pechaps owing to the nuclear size 
limitation Iis worth noting that even in the case ofthe two chromo- 
Somes in SY13, the two centromeres and four telomeres were stil dus- 
tered in oughly opposite postions inthe nucleus, similar tothe cases 
of BY4742 and SY6. Notably, the sDNA-repeat loci ofall four strains 
were sequestered from the main structure (Fig. 3b). In particular, 
the substantial clustering of the flanking sequence of centromeres 
(ed balls) and relative co-localization ofthe flanking sequence of 
telomeres (blue balls) in the BY4742 genome gradually disappeared 
as the chromosome fusion progressed from BY4742 to SY6, SY13 
and SY 14 (Fig. 3b). Notably, chromosomal fusion in S¥6 caused 
little change to the configurations ofthe unfused chromosomes, such 
as chromosome XV, compared to those in BY4742 (Extended Data 
Fig. 4). However, with the accumulation of chromosome fusions, 
‘which resulted in a larger chromosome and a loss ofthe original 
centromeres, the 3D structures of chromosomes VI, XVI and X 
changed from stretched V shapes to mote twisted globular shapes 
(Extended Data Fig 4). 

‘Almost al (97.8% and 99.7%) ofthe significant (P-<0.01, q<0001) 
iner- chromosomal interactions observed in BY4742 were absent in 
SY13 and SY14, zespectively (Fig. 3c), probably owing to the elimi- 
nation of most of the original centromeres and telomeres (Extended 
Data Fig. 32,8). On the other hand, chromosome fusions brought two 
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Fig. 3 | Chromosomal 
interactions and 3D structures 
‘of genomes in BY4742, SY6, 
SYI3 and SY14 strains. a, The 
rormalized contact heatmaps 

‘of four genomes with 10-kh 
resolution. Low to high interaction 
frequencies are depicted by | 
colour spectra from light yellow 
to red.b, 3D conformations of the 
four genomes-c, Venn diagram. 
forthe numbers of significant 
(P< 001, q<0.01)“inter- let) 
and intra-chromosome (right) 
interactions, Note that intra- and 
“inter-cheomosome here rler to 
locations inthe BY4742 genome, 
4, The directional preferences 

‘of chromosome Il from four 
genomes, paired fest 
(y-axis) assessed the interaction 
preference of «specific genomic 
region (genomic coordinates with 
5-Kb bins shown in x-axis) against 
its upstream (negative value) 

or downstream (positive -value) 
regions 


chromosomes that were distal from each other in BY4742 cells into 
close proximity resulting in new significant (P< 0.01, q-<0.01) inte: 
chromosomal interactions: for example, the interaction between 
chromosomes XV and V in SY13 cells, and chromosomes XV and XII in 
SY 14cells (Extended Data Fig. 3c). There were ten residual interactions 
ofthe single chromosome XV centromere region and chromosome It 
inall four strains but the 3D structures of chromosomes XV and II did 
not show any possible spatial interactions between their centromere 
regions (Extended Data Fig. 3d). Unlike inter-chromosomal interac 
tions, only 67.4% of total intra-chromosomal interactions were lost i 
the SY14 genome (Fig. 3c). In fact, the global direction preference’, 
which quantifies the preference ofa specific genomic region against 
its upstream or downstream interaction, was similar among BY4742, 
SY6, SY13 and SY 4 cells foreach chromosome (Fig. 3d, Extended Data 
Fig 5), and the correlation coefficient was 0.90 (P<2.2 x 10-'9. This 
result strongly indicated that the local chromatin interactions of all the 
four strains, at last at the level of gene loci (as shown by the bin =5 kb 
direction preference plot), were very simila. 


‘Transcriptome and phenome analysis 
‘The transcriptome profiles of the BY4742 and SV14 strains were ana- 

Iysed to evaluate the effects of changes in chromosome interactions and 
structure on global gene expression. Unexpectedly, the transcriptome 
of SY14 cells was nearly identical to that of BY4742 cells (Fig 4a). Only 
28 genes were differentially expressed in SY14 compared to BY4742 
cells (Fig 4b, Extended Data Table 4), accounting for 0.5% of the 5.815 
protein-coding genes'”, Fusion of all sixteen chromosomes into one 
repositioned the original telomere-adjacent genes to loci distal from 
telomeres, which presumably resulted in loss ofthe telomere position 
effect (TPE) and de-repression ofthese genes!™". Accordingly, seven 
genes (YFROS7W, MALI I, THIS, YOLI63W, YOLI62W, SEO! and 
VTH1) adjacent to the corresponding deleted telomeres were upregu: 

lated in SY14 cells. Notably; five genes (ERR2, HSP32, FEX2, YPL277C 
and MPH3) near the retained telomeres (XVI-L and X-R) in SY 14 
cells were downregulated, indicating an increase in the TPE. As about 
half of subtelomeric genes were deleted during chromosome fusions, 
the number of TPE-affected genes is likely to be underestimated in 
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a 
Fig. | Transcriptome and phenome analyses. , Heatmap of the 
transcriptome profiles af BY4742 and SY4 cells. The Pearson correlations 
(x=3) are greater than 0.98 within each group and greater than 097 
between differen groups. b, Classification of differentially expressed 
genes, defined as those with logs(fold change) > 1 and P< 0.001 in 
SY14 compared t wild-type cells, Fitness analysis of SY4 cells under 
‘various growth conditions. Representative results of two independent 
experiments. d, Growth comparison of BY4742 and SYL4 cells under 
‘various conditions. The mean atea of growth kinetics of SY14 cells from 
to independent experiments was normalized to those of BY4742 cells, 
‘nd the numerical value oft logarithm base 2s shown, The grey shaded 
‘negative values indicate greater than 50% growth redaction in SY 14 cel. 
«Restored growth of SY14 cells on medium without methionine by 
complementation ofa functional MET14 gene. SD, synthetic dextrose 
Data are representative of three independent experiments 


‘SY 14 cells, Notably, eight genes involved in stress responses (especially 
DNA replication) were upregulated in SY 14, suggesting thata giant 
single chromosome might introduce a new burden for chromosomal 
replication. 

"The SY'14 cells demonstrated a slight reduction in growth fitness on. 
complete media such as yeast extract peptone dextrose (YPD), yeast 
complete (YC) or YPG (similar to YPD but with glycerol as a carbon 
source), and an increased sensitivity to the genotoxic chemical phleo- 
‘mycin (Phl), but not methyl methanesulfonate (MMS) or camptothecin 
(CPT) (Fig. 4c). Phenotype microarray analysis showed that SY14 
and BY4742 cells had comparable growth under conditions including 
different carbon sources and osmolytes, but SY lt cells showed a modest 
growth reduction under some nitrogen sources (Fig. 4d). We found that 
the expression of the MET'I4 gene, which encodes an adenylyl-sulfate 
kinase, was reduced in SY14 cells comparing to the wild type (logs 
fold-change=~1.56, P=5.97 x 10-% Extended Data Table 4). When 
a plasmid-borne METH gene was introduced into SY14 cells, their 
growth on medium without methionine was restored (Fig, 4), sug. 
gesting that deletion of the chromosome XI centromere accidently 
damaged the centromere: proximal promoter of METI4, 

"The size and shape of BY4742 and SY 14 cells were similar (Fig. 5a). 
‘The pattern of cell cycle progression of SY 14 cells resembled that of 
‘BY4742 cells (Fig. Sb), but the SY14 strain showed a slightly reduced 
growth rate (Fig. 5c). To evaluate whether the single-chromosome 
yeast could compete with the multi-chromosome yeast, we co-cultured 
SY and BY4742 cells and monitored their geowth. As co-culture time 
{nereased, the number of SY 4 haploid cells dropped rapidly, while the 
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Fig. 5 | Sporulation and competition fitness, a, Scanning electron 
micrographs of BY4742 and SY14 cells, Seale bar lym. Representative 
images of three independent experiments. b, Cell eycle analysis. The yeast 
cells were synchronized with hydroxyurea and the progression ofthe cell 
‘cycle was analysed by flow eytomietry. Data are representative of three 
independent experiments. , Growth curves of the SY 14 haploid (lft), 
and diploid (right) strains (mean-+s.em.). Three biological replicates 
‘were assayed. d Growth competition of BY4742 and SY 14 haploid cells 
Data from five biological replicates, e,Sporulation efficiency of diploid 
cells with differen chromosome numbers, For each genotype, two 
independent diploid colonies were examined, and the results shared the 
same trend. f, Spore viability oftetrads. Spores of ten tetrade are shaven 
The spore viability was caleated fom 60 tetrads for each stain. Data are 
representative of twa independent experiments, 


BY4742 haploid cells dominated the populations (Fig. 5d), suggesting 
that the competition fitness ofthe single-chromosome yeast is lower, 


Meiosis and spore viability 
‘Organisms that reproduce sexually are thought to have advantages over 
‘organisms that reproduce asexually. We evaluated the ability of the 
single-chromosome haploid cells to mate and form diploid cells and 
reproduce sexually. We constructed strains BY4742* and SY14%, in 
‘hich the Mato. cassette was replaced with a Mata cassette”. Haploid 
SY 4 and SY14* cells were able to mate to produce diploid cells, simi: 
larly to the parental strains. But the SY 14/SY 14 cells displayed asightly 
reduced growth rate (Fig. 5c). In addition, we noticed that two out of 
six colonies of SY 14/SY L4* diploid cells could not maintain a correct 
diploid chromosome number upon mitotic division, Moreover, the 
SYL4/SY14* cells displayed weak competitiveness when co-cultured 
‘with BY4742/BY4742* cells, and we observed the emergence of cells 
that contained genomes from both diploid cells (Extended Data 
Fig. 62-0), indicating fusion of SYL4/SY 14*and BY4742/BY4742* cell, 

The SY 4/SY 4 cells were able to undergo meiosis to produce viable 
spores, but with reduced gamete production (Fig. 5e). Inaddition, the 
spore viability for SY14/SY 14" cells was 87.5%, which was lower than 
the 98% observed for BY4742/BY4742* cells (Fig. 5). The diploid cells 
of intermediate strains SY6/SY6* and SY 13/SY13" displayed 96.5% and 
9396 spore viability, respectively (Fig. 51, suggesting that spore viability 
is decreased as the number of chromosome fusions increases. 


Discussion 
Recently, synthetic biology has made great advances in the design and 
synthesis of individual chromosomes in the eukaryote S. cerevisiae. 
‘The synthesized cells largely resemble the wild-type cells!*#!, imply. 
ing that this organism can tolerate large-scale genome engineering 


wel. In this study, we created a biologically functional S. cerevisiae 
(SY14) with a single giant chromosome by successive fusion of 
sixteen native chromosomes, representing, to our knowledge the fist 
example of a eukaryote witha single linear chromosome created in the 
laboratory 

Previous studies have suggested thatthe localization ofa chromosome 
in the nucleus and inter-chromosome interactions affect gene 
expression”, In our study, chromosomal fusions involving sixteen 
cliromosomes esult ina lss of the majority ofthe inter-chromosomal 
interactions seen in parental cells, leading to marked changes in the 
overall comosomal 3 structure. However, the global directional pf 
erencesat the level of gene loci (5-kb intervals are largely retained in the 
SY 14 cells. Accordingly, the transcriptome ofthe single-chromosome 
SY1d cells nealy identical to that ofthe parental BY4742 calls. These 
observations demonstrate that intr-chromosomal interactions havea 
negligible effect on global gene transcription in yeast. 

was unexpected thatthe single point centromere in S. cerevisiae, 
‘hich is only 125-bp long”, can support the segregation and parti- 
tion ofthe 11,8-Mb chromosome, which is eight times larger than 
the longest nalive chromosome. Nevertheless, several genes involved 
in the stress response, especially DNA replication sires, are upreg. 
ulated in the single-chromosome yeast, consistent with the reported 
study of an increase in eplication-induced topological stess with 
chromosome length in S.cerevisiae®, The tendency of SY14/SY 14" dip 
loid cells to form polyploidy also suggests a functional defect of chromo 
some segregation in single-chromosome yeast thisis likly to cause the 
reduction in gamete production and viability in meiosis. Consistently, 
both the haploid and diploid cells of the single-chromosome 
yeast are disadvantaged when competing with wild-type cells. 
“Therefore, the deleterious functional impact ofa single giant chromo- 
some, which may be due to chromosome replication and segregation, 
right explain why eukaryotic genomes are organized into multiple 
clhromosomes Infact, S. cerevisiae and its wild relative species have 
all maintained sixteen chromosomes across 10-20 million years of 
evolution, although their chromosome structures are not identical 
Inan accompanying paper, another group created two chromosome 
budding yeast”. Their results are consistent with ours in that the 
chromosome fusions have minimal effects on cell growth and the 
twanscriptome 

‘This study provides an alternative approach for studying the 
evolution of eukaryotes with respect to their chromosome structure 
and function. The series of strain (SYO-SY14) with successive fusions 
of sisteen chromosomes created in this study would be of considerable 
value for future investigations of telomere biology, centromere and 
kinetochore biology, meiotic recombination, and the relationship 
between nuclear organization and function 
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METHODS 
‘No statistical methods were used to predetermine sample size, The experiments 
‘Were not randomized and the investigators were not blinded to allocation during 
‘experiments and outcome assestment. 

Plasmid constructions. The Cac9 expression plasmid pCas9 was constructed 
from pMetCa39™ by eplacing the selection marker METI with LEU2. The guide 
[RNA expression plasmid (pgRNA) was constructed in three steps (1) the vector 
pHISI26 was constructed by Gibson assembly ofthe HIS3 gene amplified fom 
S.cerevise S284C genomic DNA and vector backbones amplified from p426- 
SSNRS2p-gRNA.CANLLY-SUP&t. (2) Each guide RNA expression cassette contains 
the SNR52 promoter, a 20-bp target sequence and the structural component of 
guide RNA, followedby the SUP 3 lanking sequence. The 20-bp target sequences 
ff guide RNAS 1, and were manually selected frm the upstream of any 3-NGG 
feat the to-be-deleted centromere and telomeres. The guide-RNA expression 
‘cassettes were generated by fusion PCR of two segments, the SNRS2 promoter 
and the guide RNA structural componentthe SUP 3 lanking sequence segment, 
both of which were PCR amplified using p426-SNRS2p-gRNA.CANLY-SUPAt 
(Addgene plasmid 1D: 43803) asa template. For each guide RNA expression 
‘caste, 20-bp RNA target sequence and specific restriction sites were introduced 
atthe end of PCR primers. (3) The three target cassettes gRNAJ, gRNA? and 
{gRNA3 were digested with pur of restriction enzymes EcoRI-Bamlil, Bam 
‘Neol and Neol-Not, respectively, and were ligated to an EcoRI-Not-digested 
puis. 

CCRISPR-Cas9 facilitated chromosome fusion, Approximately ljgfeach DNA 
targeting cassette (with 50-400 bp homology arms and 200 bp direct repeat (DR) 
Sequences forthe curation of selection markers inthe second step) and pgRNA 
were co-transformed in .cervisae BYA742 (Euroscarf not ested for mycoplasma) 
Calls harbouring pCa, which constitutively expressed Cas9, using a standard ith- 
jum acetate transformation protacoP", The transformation products were plated 
fon thesynhetic drop-out medium SC-Ura-His-Leu omiting uracil istine and 
leucine). The positive colonies verified by PCR sequencing were inoculated and 
grown in 3 lof SC-Uira-His-Leu liquid medium to saturation at 30°C. The cell 
Cultures were then transfered to SC-Leu medium containing % galactose and 3% 
raflinose instead of glacot, with an nti optical deny of ODay 03. After 
16h, 100, culture was plated and grown on SC-Leu with 1 mg/ml5-FOA. The 
‘uration of selection markers and peRtNA ofthe postive colonies ws verified by 
PCRanalysis and sequencing. The verified single colony wasinoculated in SC-Lew 
‘medium to start the next round of chromosome end-to-end fusion. 

‘Telomere Southern bot. Telomere Southern blotting sn hybridization were per 
formed as previously described” In bee, oughly equal amounts of genomic DNA 
were digested with Xho, and separated by eleceophoresison 1.0% agarose gl, The 
[DNA was then transferred toa Hybond-N + Nylon membrane (GE Healthare). 
Probe labelling, hybridization and immunological detection weee performed 
using DIG-High Prime DNA Labelling and the Detection Starter Kit Tl (Roche). 
‘An 8I-bp TG,_. DNA fragment was chosen as a telomere-spcific probe, and a 
ffagment from chromosome served as aon telomeric control probe. 

Genome sequencing, assembly and data accessibility. total of 20, of igh 
‘quality genome DNA was extracted from BYA742 and SY14 cells. A20 kh SMRT- 
bel sequencing library (Pacific Biosciences) was constructed usingasze selection 
protocol on the BluePippin (Sage Science). The two SMRT-bel yeast genomic 
libraries were sequenced using 3 SMRT cells (Pacific Biosciences) witha 10-h 
‘moving time window nthe PacBio Sequel (Paci Biosciences sequencing platform. 
Primary filtering on polymerase reads was performed using the SMT analysis 
package V4.0 (htps/wwwpacb comv/support/software-dovinloads). 

“To assemble the BYA742 and SY 14 genomes, 320> and 426% of Paco subreads 
were sed, respectively. The genomic sequence data of SY 4 and BY47A2 genomes 
were diretly assembled into and 16 contigs using CANU™ (version 15) without 
‘edtional data or scalding steps. The assembled genomes have no Ns in their 
Sequences. Owing othe high numberof repeats in the (DNA region (100-200 
copies! of -9,100 bp unit), we could not assemble all repeat sequences for this 
highly epeated region, Using longer reads of Pactio sequencing, we have ssens- 
bled a repeated region longer than reported in the public S284C reference genome 

SCF_000146045.2_R6i/). The nuclear genomes were revised using poaliga 
(htpsgithub-com/PaciticBioscences/pbalgn, version 0.3.1 algorithm: blast, 
bitPolicy: randombes,algorithmOptions:-bestn 1-nCandidates -bam, 
smaxDivergence 15.0, minAccuracy 85.0) and arrow (hitps//github.com) 
PacifcBiosciences/GenomicConsensus, version 2.0, minCoverage 15). The 
‘mitochondrial genomes of BY4742 and SYA were assembled using minimap 
(Gersion 02-r124-diry and miniasm (version 0.2-1137-dirty)". The revision of 
sitachondral genomes of BY4742 and SY14 cells was performed using the sme 
procedure with modified parameters of max Divergence 30.0, min Accuracy 70.0 
(pbalign) and min Coverage 30(arrow)-The mitochondrial enomes were cyclized 
‘vith minimus? (spit com sanger pathogens/citclatr wiki/Minms2- 
clrcularizaton-pipline, AMOS, version 3.1.0), 


“The genomes of BY4742 and SY14 cells were aligned using LAST (version 
10), The scripts last-plit and maf-swap were used to achieve I-o- ligament 
according othe manal (hipster jp doe/as-split him) The alignments 
between two genomes were extracted from LAST output using custom Python 
script and ploted with Circos" (version 0167-7). To correct for possible error 
of Single Molecule Real-Time (SMR) sequencing read (peedominantly either 
<eltions or insertions) and to obtain the eeliable genome diference between 
BY4712 and SYM, we re-sequenced bath SYI4 and BY742 whole genomes 
‘using next generation sequencing (NGS) lumina par-end sequencing, which 
provided 200% coverage for SYT4 and 233 coverage for BYA742 cell These 
‘ew sequence data were mapped tothe S288C genome to identify the genomic 
dllferences between the two strains, which were further validated by Sangee 
-LClibrary preparation and sequencing. ‘The genomic DNA from exponential 
phase cells was cos-lnked and digested with 200 unit Mbot enzyme (NEB) ss 
previously described. The euting DNA ends were labelled with biotn-M-dCTP 
(iLINK) followed by ligation. Puried DNA was sheared to alngth of 00 bp. 
Poin ligation junctions were pulled down with Dynabeads MyOne Steptavidin 
C1 (Thermolisher). The Hi-C brary fr lumina sequencing was prepped with 
the NEBNext Ultra It DNA library Prep Kit fo lumina (NEB) accoeding othe 
‘manufacturers instructions. Fragments between 400 and 600 bp wer pired-end 
sequenced ona HiSeq2000 platform (lumina. 
‘Construction of contact map and chromosome 3D model from Hi-C data. Using 
the IE software package (version 1815d0cce)" the Hi-C data of BYA742 cll 
‘were iteratively mapped othe BYA742 genome, while the Hi-C data ofSV6,SY13 
nd SY4 eels were mapped to their own genomes. Dandlng ends and other 
‘unusable data were filtered, and the valid pairs were binned into 10-kb non- 
‘overlapping genomic intervals to generate contact maps, The contact maps were 
‘normalized using an teatve normalization method to ciminae systematic biases 
‘To correct the overall decay of chromatin contacts with genomic distance, the 
‘whole genome interactions o BY4742 compared to SY6,SV13 and SY calls were 
transformed into Zscore sing E Cranes method" (package cworld-dkkar from 
‘tps: github com /detkelabeworld-dekerzleases, 01). Then, we eaeulated 
the difference betwen the Z-scores of SY14, SY13 and SY6 to BYA742 to infer 
the dfirence between their whole genome interactions. The chromosomal 3D 
structures ofthe four strains were inferred using the Pats (01) method with 
‘ mulkidimensional scaling (MDS) model. The 10-Kh contact maps were used to 
Construct the 3D model The DNA region was reconstructed by assuming that 
every bin in rDNA loc vas equally in contact withthe remainder ofthe genome. 
Calculation ofintra-and inter-chromosome interactions. As the correlation 
efficiency ofthe two biological replicates fr each train was very high (>0.98) 
‘sing QuASAR-Rep analysis frm HiFive® v1.53), we pooled the data fom two 
replicates together for significant interactions. The contacts between 5-kb pits of 
{ntrachromosome bins of fur strains were transfered o AVS FA-HI-C software 
(1.01) to caleulatethe corresponding cumulative probability P value and false 
lsconey ate (FDR) q value fer calculation, the interactions in which bth the 
‘vale and q value were less than 0.01 were identified as significant interactions 
“The statistical significance and fale discovery rate ofthe interchromosome 
interactions of BYA742, SY6and SY13 calls were calculated using Xs method 
‘Wethen calculated the FDR valu, andthe iteractions in which both the P value 
snd q value were less than O01 wer identi as sigiicant interactions 
‘Comparison of significant interactions Interactions wece considered signiticant 
at P< 001, q~001 The coordinates ofSY14,SY13 and SY6 gename bin were 
‘mapped to those in the BYA7A2 genome by sequence alignment. For bette compar 
‘son we separated he interactions int inteachromosome and intr-chromosome 
(between orginal chromosome regions) and performed pairwise comparison. 
‘We defined the sequences of 100 bot-cutting sites around the centromere asthe 
centromere egians, and the sequences of 100 Mot-cuttng sits adjacent othe 
telomeres the telomeres regions 
[RNA-seq analysis, The BYA742 and SY call fom eary-og phase (ODau=1) 
‘were collected, and their RNA was isolated using the TRIzol (Invitrogen) 
‘method. The library preparation followed the standard procedure (BGI). The 
libraries were sequenced onthe Illumina HiSeq 4000 platform using the 150-bp 
pai-end sequencing strategy Fr each sample, 6 Gof clean data was obtained. 
“The leaned reads were mapped onto the S. cerevisiae SPR4C reference genome 
(GCF_000146045.2_R6A/) using Bowtie? (v2.2.2), andthe abandon estimation 
‘was conducted by RSEM (v1.30) The significant ferential expressed genes 
(DEG) were denied by DEseq2 (v.16. wih defintion offal change more 
than 2and false discovery rte (FDR) <0.01 
‘Phenotypic microarray analysis. Cellfehlygrown on YPAD plates were inocu- 
Inte into yas nutrient supplement mature (NS 48-048 mM -histdine HCL 
48mM cleucine, 24 mM lysine HCL and 1-44 mM uracil) and adjusted toa 
‘tansmitance of 62% Tsing Biolog turbiimeter (Biolog) Phenotype micouray 
plates PMI-PMIO (Biolog) coated with differen nutrients and chemical substrates 


were used to incubate three cll types. or PMI-8, Biolog growth medium as 
prepared using IFV-0 (FY-0 = 12) bas inoculating lid, supplemented with a 
‘mixture of 02% (/y) yeast nutrient supplement mixture (NS = 48) and 0.013% 
(is) djemixD (Biolog). To the growth medium, an extra 100 mM p-glucosc had 
to beadded for PMS-8. For PM and 10, Biolog growth medium was prepared 
‘using 0.67% (w/e) YNB w/o amino acids (Sunrise Science) and 0.12% (wv) SC 
amino acid mixture (Sunrise Science) supplemented witha mixture of 0.01% (s/s) 
yeas nutrient supplement mature (NS » #8), (01% (fs) dye mix E (Biolog) and 
100 maM b-glicos. The inal volume of 12 ml was reached using reverse osmosis 
(RO) sterile water for every phenotypic microarray plates and added at 100 yl/ 
well Data were recorded photographically a 15 min intervals st 30°C for 120h 
and converted toa value rellecting metabolic activity by the OminiLog software 
(version 2.3.01) 
Cell growth, morphology and cell cycle analysis, Sains BYA742 and SYA were 
fresly streaked on plates, and thre individual colonies were picked and inocu 
lated in liquid YPAD medium overnight at 30°C. The cell cultures were harvested 
and dilated in 25 ml of res liquid YPAD medium toa final ODa of 0.1. The 
optical density of cells was measured hourly and exponentially growing cells were 
collected. The samples were prepared for scanning electron microscopy (Zeiss) as 
described previously” 

‘Methods of cell synchronization and cell cyte analysis using Now cytometry 
(Beckman) were performed as previously described. In brief the yeas strains 
were synchronized with 200 mi hydroxyurea for 1h. Then els were washed 
five mes with pre-warmed YPAD for release from hydroxyurea. The clls were 
collected every 15 mia, washed, and fixed in 70% ethanol at 4°C overnight. Cells 
were then treated with RNase (20 mg/ml Sigma) at 37°C for2-3h Samples were 
stained by PicoGreen (Invitogen) and analysed by low cytometry. Approximately, 
10° cells were analysed for each strain. Data were analysed with Summit 3.2. 
Genotoxin sensitivity astay. Single colonies of the tested strains were cured in 
‘YPD medium at 30°C overnight. The cell numbers were adjusted to about equal 
foreach strain, and tenold serial dilutions were spotted onto pats with or without 
the indicated genotoxins. The plates were photographed after incubation a 30°C 
for2to'3 days, For teniperatute sensitivity asst, plates were incubated t 24°C, 
30°C and 37°C for 2 ta 3 days hefre photography 
Growth competition. The H/S3 and URA3 genes were introduced into BYA7A2 
and SYM chromosomes, respectively, Approximately 1% 10¥ exponentially grown 
tells ofBY472 and SY 14 haploid cells wereco-inoculated infesh YPD medium st 
30°C. Theco-cutures were uted 10 0.05% daly with sh YPD. The cocultures 
after 0,1, 2and 3 days were plated on YPD plates, and 100 colonies grown from 
these plates were spotted on both SC-His(ymthetic complete medium without 
histidine) and SC-Ura (synthetic complete medium without uracil) plates to 
calculate the viable cells of BY4742 and SY14, respectively The geawth compet 
ton of BY4742/BY4742 and SYLA/SY 14" diploid cells were caried out smal. 
Immunofluoresence. A 3-miyc pitope-tag sequence was inert atthe genomic 
locus ofthe SIR yene Indirect immanofluocescence was performed as described 
previously, Cells were grown in YPD medium overnight toadensty of 1-2» 0? 
‘ell and fixed for 30 min by incubation with 4% formaldehyde. Next, els 
‘were washed with 0.1 M potassium phosphate (pH 6.5) and P solution (12 M 
Sorbitol and | MPO), and e-suspended in P solution Cells were subsequeatly 
tweated with 0.1 mg/mlzymolyase(20T, MP Biomedical) for 10 min, washed with 
P solution, and spoted on poly-1-lysine pre-treated slides. After insing in PBS-T 
butler (PBS containing 0.1% Teton X-100 and 1% BSA), slides were incubated 
‘vermght at 4°C with anti-Myc, anti-Rapl and anti-Nopl antibody diluted in PRS 
containing 1% BSA, Slides were then washed with PBS-T buffer and incubated 
with the appropiate secondary antibodies cnojugated to Cy3 oe Alexa 488, The 
DNA fluorescence signal was detected by DAPI (Il in phosphate buffered 
‘saline (PES) solution) staining Sides were mounted with PBS containing! mg/ml 
p-phenylenediamine, 25 uM NaOH, and 90% glycerol. Confocal microscopy 
(Ceica) was performed on a Leica TCS SP2 microscope with a 63 » lambda blue 
objective (ol) Image processing included similar filtration and threshold level 
standardization forall images. 
‘Switching the MATa. mating-type ofthe SY strain to MATa to generate 
the SYL4* strain. Most sequence ofthe MATH and MAT lock in dierent mat 
ing type yeasts are identical except that «703 bp region in the MAT locus (5 
ctgatinyecttcggggaa...gtagagiggttgacgaataatt-¥) is diferent from an $07 
bp region in the MAT locus (5taigtctagtatgtggatiana...gtigacgataatatgt- 
guage). We used the CRISPR-Cas9 system to replace the 703 bp sequence with 
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the 807 bp sequence in the BY4742 and SY sess, to generate BYA742" and 
SY LH strain, respectively 

“Mating and sporulation assays. Mating of MATa and MATa strains was 
performed on YPD plate by micromaniplation, Colonies ofthe resulting diplokd 
trans were e-steaked on anew YPD plate to obain a single clone. The dipoide 
‘were verified by PCR amplifications using the two pairs f primers specific for 
the different mating-type loci. For sporulation, diploid strains were inoculated 
int 5 ml of sporulation medium toa density of ODjyy— 0.6 and cultivate the 
strains at 23°C. Use hemocytometer count total of > 200 cellsevery 24h, and 
sporulation efficiency was measured by the rai ofthe numberof sc containing 
3 r4sporesto that ofthe unsporulated cells Spores were dissected and at least 
‘tetrad were disected to measure spore viable. 

Reporting summary. Further information on experimental design savaablein 
the Nature Reseach Reporting Summary linked to this pape. 

Data Availabilty. Genome sequencing data and th assembled genome sequences 
of SYM and WT have heen submited to NCBI with a project accession numberof 
PRINA429985. The HL-C sequencing data of SY, SY13, SY and BY4742 have 
been submited to NCBI witha project accession number of PRINA4S1I61. The 
RNA-seq data have been submited to NCBI witha projet accesion number of 
PRJNAGS1522_Alldatacan be viewed in NODE (http://www iosino org node) by 
pasting the accession (NODEPOO371807) ito the text search box or through the 
[URL hip iosin.ory/node/ project detail/NODEPOOS71807. 
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Extended Data Fig. 1 | Theoretical Xhol digestion pattern of digestion sites in telomere regions ae indicated, and the numbers in kb in 
chromosome ends. The X,STR, and Y'clementsineach subtelomeric _parenthesis indicate the sizes of DNA fragments recognized by the TG, 5 
regions are marked with black, grey and white boxes, respectively. probe in the Southern blot analysis. 


The TG.» telomeric sequences are marked with arrow tips. The Xhol 
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‘Extended Data Fig. 2 | De novo sequence comparison of BY4742 light by experimental desig are shown in BY47A2 chromosomes. Sequence 
{grey) and SYI4 (dark grey) genomes. The chromosomes arelabelled with deletions and insertions identifed by genomic comparison between 
Roman numerals ofthe yeast reference genome. The elomeres (luc), BY4742 and SY are highlighted in purple and back, respectively. in 
centromeres red) and elomere-assciated repeats (green) that were cut SY chromosomes, 
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‘SY minus avar42 ‘813 minus @ya742 ¥14 minus BY4742 
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Extended Data Fig.3 | Comparison of the chromosomal interactions The red bars indicat the centromeres and their flanking regions of 
of S¥6, SY13 and SY14 cells with those of BY4742 cells. a, Z-score 150 Mol restriction sites. Bach arc throughout the chromosome XV 
dillerence heatmaps. Bin length, 10 kb; red and blue show increased and centromere area represents one strong interaction. In SY6, SY13 and 


decreased chromatin interactions, respectively. Green box highlights the SY14, the reserved interactions are marked with black arcs and new 
Interactions of the chromosome XV centromere with other chromosomes. interactions are marked with orange arcs. The green arrowheads mark 


b, Venn diagram of the numberof significant (P<0.01,g<0.01)‘inter- the ten residual interactions near the centromere regions found inal four 
and intra-chromosomal interactions (referring to theirocationsin the strains. d, 3D structures of chromosomes XV and If in SY6, SY13 and SY14 
‘BY4742 genome)-c, Strong chromosomal interactions of chromosome __cells compared to those in BY4742 cell. The locations of the 10 residual 
XV centromere regions in the BY4742, S¥6, SY13 and SY 14 genomes, Jnteractions on Chr. XV and Il are marked green 
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fue Data Fig. 43D structures of single chromosomes. Chromosome stractures in SV6,SV13 and SY callsare compared to those in BYATA2 cells. 
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Extended Data Fig. 5 | Directional preference plots of S¥6, S¥13, ‘value between the upstream and downstream interactions ofeach bin. A 
and SY14 cells compared to BY4742 cells. Red, BY4742; moss green, positive t-value indicates that a bin has more dovenstream interactions, as 
6; purple SY13; bright green, SYL4, The y-axis denotes the test described previously" 
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Extended Data Fig. 6 | Grovth competition between SY14/SY14" 
and BY4742/BY4742" diploid cells. a, Blue circles represent BY4742/ 
BY4742" (with HIS3 marker) cells that could grow only an SC-His plates; 
pink triangles represent SY14/SV 4" (with URAS marker) cells that could 
grow only on SC-Ura plats; green diamonds represent ‘usion cells of 
BY4742/BY4742" and SYL4/SYLA* that could grove on both SC-His and 
SC-Ura plates, Data from three biological replicates are presented. b, FACS. 
analysis of DNA content of BY4742/BY4742" and SY14/SY 4 diploid cells 


before and after co-culture. Data are representative of two independent 
experiments, ¢, PCR verification of genomes from BY4742/BYA742" and 
SYHISY 14" diploid cells, HI-H3: colonies groven only on SC-His plates 
HUL-: colonies grown on both SC-His and SC-Ura plates. The BY4742/ 
BY4742" and SY14/SY14* diploid cele before co-cultivation were used as 
control. Two pais of primers, specific for genomes of BY742/BY742* 
and SY14/SY'L4, were used. Data shonin are representative images of two 
independent experiments 
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Extended Data Table 1 | Details ofthe creation ofa single chromosome 
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Extended Data Table 2 | Information regarding long repeat 
sequences near chromosome ends 


Types ot Location on chromosomes (bp) 
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Extended Data Table 3 | SNPs and indels confirmed by re-sequencing 
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+ errors located within recombination regions during chromosome fusions, but not located within primer binding sites, 
“errors located within primer binding sites. 


Extended Data Table 4 | Differentially expressed genes in SY14 compared to BY4742 cells, 
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Ferroelectric switching of a two-dimensional metal 


Zaiyao Fei’, Wenjin Zhao", Tauno A. Palomaki®, Bosong Sun, Moira K. Miller!, Zhiying Zhao™, Jiagiang Yan', 


Xiaodong Xu! & David H. Cobden'* 


A ferroelectric isa material with a polar structure whose polarity 
can be reversed (switched) by applying an electric field!*. In metals, 
itinerant electrons screen electrostatic forces between ions, which 
explains in part why polar metals are very rare’. Screening also 
excludes external electric fields, apparently ruling out the possibility 
of ferroelectric switching, However, in principle,a thin enough polar 
‘metal could be sufficiently penetrated by an electric field to have 
its polarity switched. Here we show that the topological semimetal 
We; provides an embodiment of this principle. Although 
monolayer W'Te2 is centro-symmetric and thus non-polar, the 
stacked bulk structure is polar. We find that two- or three-layer Wes 
exhibits spontaneous out-of-plane electric polarization that can be 
switched using gate electrodes. We directly detect and quantify the 
polarization using graphene as an electric-field sensor*. Moreover, 
the polarization states can be differentiated by conductivity and 
the carrier density can be varied to modify the properties. The 
temperature at which polarization vanishes is above 350 kelvin, and 
even when WTe» is sandwiched between graphene layers it retains 
its switching capability at room temperature, demonstrating a 
robustness suitable for applications in combination with other two- 
dimensional materials", 

A polar material contains an axis (referred to asthe polar axis) along 
which the two opposite directions are distinguishable. This property 
is necessary for the existence of a spontaneous electric polarization. 
(Of the 32 three-dimensional crystal classes, the ten that have a polar 
axis are known as the pyroelectrics, because heating them changes 
any electric polarization along this axis to produce a voltage. When 
‘Anderson and Blount introduced the term ferroelectric metal in 1965, 
they were referring to the possibility of polar structure appearing in 
certain metallic crystals upon cooling. Flowever, they assumed that, 
even ifsuch polar metals existed, the polarity would not be switchable. 
Definite cases of metals with polar structure have been identified only 
very recently” 

Several ferroelectric insulators have been found to maintain ferroe- 
lectric characteristics in ultrathin flms'*“!, However, when materials 
witha layered structure are thinned towards the monolayer limit their 
properties often change qualitatively. This is illustrated by, for example: 
‘graphene, which becomes a two-dimensional Dirac metal’; MoS2, 
‘hich changes from an indirect- to a direct-gap semiconductor"; and 
rls, which varies between being antiferromagnetic and ferromag: 
netic®. Another example isthe topological semimetal W'Tes", which 
becomes eithera two-dimensional topological insulator!” ora super: 
conductor at low temperatures in the monolayer limit, depending on 
the level of electrostatic doping. Here we focus on another aspect of| 
‘Wes: the fact thatit isa polar metal. Itsthree-dimensional (17) struc 
ture has a polar space group’, Prni2), and it remains metallic down 
toa thickness of three layers when undoped” and a monolayer when 
electrostatically doped”, We show here that as W'Tep approaches this 
limit the polarity can be switched, making it effectively ferroelectric 
even when itis metallic in the plan. 

"The 11” structure (Fig. 1a) contains b-c mirror (M) and a~c glide 
(G) planes, so the polar axis, which must be parallel to both of them, 


is the caxis, perpendicular tothe layers”, We apply an electric field 
along this axis using the device geometry indicated in Fig, 1b. An elec: 
trically contacted thin W'Tes flake is sandwiched between two hexa- 
gonal boron nitride (h-BN) dielectric sheets, with thicknesses of de 
{top) and dy (bottom). Above and below are gate electrodes, usually of 
few-layer graphene, to which voltages Vand Vs are applied relative to 
the grounded W'Te (see Methods, Extended Data Fig. | and Extended 
Data Table | for device fabrication and characterization) 

We define the applied electric field passing upwards through 
the layer, which will couple to out-of-plane polarization, as 
E,=(—Vild + Vilds)/2. When E, is swept up and down, in the con 
ductance of wilayer (Fig. 1c) and bilayer (Fig. 1d) devices we observe 
bistability near £ =, characteristic of ferroelectric switching, at all 
temperatures from 4 K to above room temperature. No bistability is 
seen in monolayer W'Te; (Fig le) consistent with it structure having 
centre of symmetry (Fig le inset, red circles) and hence being non. 
polar: this symmetry also rules out instabilities involving charge 
injection into the h-BN. Nor is bistability seen in thicker crystals, 
including when one is used asa gate electrode (Extended Data Fig 2). 
"This, and the larger field required to switch the rlayer device than the 
bilayer device, can be explained by screening of E on alength scale 
of nanometres. 

‘We saw similar bistability in all bilayer devices (Extended Data 
Fig. 3). To prove that itis associated with out-of-plane electric polar 
zation, we made devices in which the top gate is replaced by monolayer 
‘graphene, the conductivity of which is sensitive to the precise electric 
field Bin the upper h-BN. In Fig. we present measurements at a series 
‘oftemperatures on such abilayer W'Tes device (B2) with four gold con- 
tacts tothe top graphene (Fig. 2a; Extended Data Fig. 4). Ifthe W'Tes 
acts asa conducting sheet then it will screen out any electric field due 
toa voltage applied to the bottom gate. Indeed, Fig. 2b demonstrates 
that the conductance Gi, of the graphene depends only very weakly 
oon Vi, except in a certain interval where itjumps between two states. 
"The conductance of the WTes is bistable in precisely the same interval 
(Extended Data Fig. 4). The two states must correspond to different 
values of E that can occur for exactly the same set of applied bias volt: 
ages, This implies the existence of two different vertical distributions of 
charge in the bilayer W'Tes. We deduce that sweeping the bottom gate 
changes E (here E, = Vi/(2dy) because V=0), which atthe ends of 
the hysteresis loop flips the polarization state (henceforth denoted by 
Pf orP{), changing F, by an amount 88, and so changing Gy. 

‘We infer 6, by applying a bias Vw to the WTe, and measuring the 
change 8V =diB, that is required to produce the same change in Gr, 
(Fig. 2c) For the simplified case of d= d, and all voltages at zero, the 
electrostatic potential profile is inverted between P{ (red) and P| 
(green), as sketched in Fig. 2d, and the areal polarization density is 
Pr eqbW (Methods), where cp is the vacuum permittivity. At 20 K, 
this gives P= 1 x 10*e per em, which is equivalent to transferring 
about 2 x 10" electrons per cm between the two layers, distance of 
about 0.7 nm. This s three orders of magnitude lower than the volume 
polarization density of around 0.2.C m *:10"* electrons per em? in 
the classic ferroelectric’ BaTiO,. Combined withthe micrometre-scale 
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Fig. 1 | Evidence for ferroclectric switching in Wey. a, Structure of 
three-dimensional 11” W'Tes, showing the mirror plane (M; dashed), 
slide plane (G; dotted) and polar caxis (red arrow, up: green arrow, 
down). W atoms are blue: Te atoms are orange. b, Schematic cross-section 
ofthe device gometry used to apply an electric field £, normal toan 
atomically thin W'Fe, flake. c,d, Conductance G of undoped trilayer 
device TI (e) and bilayer device BI (d) as Eis swept up and down (black 
arrows), setting Vid. = — Vildy to avoid net doping. The plots show 
bistability associated with electric polarization up (red arrow) or down 


device size, sucha small polarization makes it very hard to detect the 
ferroelectrcty using standard displacement current measurements. 

In Fig ewe plot V as function of 7 Between about 0 K and 300 K 
it decreases roughly linearly with T, extrapolating to zero at oughly 
450 K. However, above about 340 K the signal becomes unstable 
and we can no longer identify a hysteresis loop, suggesting that a tran- 
sition to anon-polar state occurs in this temperature range 

‘We also made a simpler device with no top h-BN and monolayer 
graphene directly encapsulating the bilayer W'Tey t exhibited highly 
reproducible hysteresis in the conductance, visible up to 300 K 
(Extended Data Fig. 5), This result demonstrates that the ferroelectric 
Switching is robust enough for potential applications at room temper- 
ature that use tin combination wit other two-dimensional materials, 

‘We also investigated the effect of gate-induced charge doping, 
defined by n= ey eu( Veh + Vole, where is the electron charge 
and e4.qy is the relative permittivity of h-BN. Ifthe material were a 
simple metal, 1, would be the areal density of added electrons. in 
Fig. 3a, b we plot the conductance G at 7 K for device BI (the same 
bilayer asin Fig. 1d), a8 a joint function of Vand Vj, measured with 
stepped and Vs swept up or down, Each sweep was started in the 
same fully polarized state, The black dashed lines in Fig a, b denote 
E, =O and the white dashed lines denote n,=0. The two plots ditfer 
only inthe central hysteretic region, as is made clearer by plotting the 
difference between them (Fig. 3c). Similar behaviour is seen at higher 
temperature (Fig, 3d, a 200 K), At E, =0, Gis similar function of 
‘nefor both P} and P| (Fig. 3e), with a temperature dependence that 
is insulating near n=O and metallic for 1. no with critical den- 
sity ne=2 « 10! cm, as reported previously" In Fig. 3f we show 
traces obtained by sweeping E, repeatedly up and down for selected 
values of nat 7 K. In each case the single conductance level at large 
E, evolves smoothly and reproducibly into one of the two stable levels 
as Es is reduced to zero, implying tha the state remains uniformly 
polarized without domain structure, at £, =0, For small or negative 
tn the effect of E, is large and of opposite sign for Pf and P|, producing 


(green arrow), at temperatures from 4 K to 300 K (as labelled) Here the 
conductance isthe reciprocal ofthe four-terminal resistance. The undoped 
twilayer has a metallic temperature dependence, the bilayer an insulating 
fone Inset toc, optical image ofa representative double-gted device. The 
We; lake has been artificially coloured red. Scale bar, 10 ym. e, Similar 
‘measurements on a monolayer W'Te; device (MI), showing no bistability 
‘AtAK, conduction isin the quantum spin Hall regime. Insets, location 
(ed circle) of centre of symmetry in the monolayer, viewed along the 8 
(left) and a (ight) axes 


buaterfly-shaped hysteresis loos, Form, wll above E,hasles eect 
on the conductance and the hystress is smaller, but stil present. 
Hence, the doped bilayer device, lice the trilayer device is simultane 
ously ferroelectric and metallic 

‘At low temperatures (Fig. 3) we observe an increase inthe width 
ofthe hysteresis loop for increasingly negative whereas at 200 K 
(Fig. 3 ts almost independent of When the conductance jumps 
there i some stochastic variation in the positions and substructure of 
the jumps, which is indicative of domain dynamics, ithe surrounding 
gates were not present to screen the depolarization field, domains 
‘would inevitably form to limit the electrostatic energy, as observed 
in other ultrathin ferroelectrics!" In our devices, defects such as 
rips, bubbles and flds could nucleate domains or pin domain walls 
Inaddition,£, isnot completely uniform, because above and near 
the platinum contacts itis reduced by screening Indeed, the pattern 
of switching depends on the choice of measurement contacts within 
a given device (Extended Data Fig 6), We aso observe that in some 
bilayer devices, such as 82 (Fig. 2), the switching eld snot symmetric 
about E, —0. A possible explanation fr thisis that sometimes, despite 
all precautions, uring device fabrication one side of the W'Tey flake 
‘vas exposed to mild oxidation, producing asymmetric trapped charge 

“The fact thatthe conductance i sensitive fo the polarization is con 
sistent withthe expectation thatthe polarization redistibutes charge 
between the layers, which are inequivalent when Eis non-zero 
Although the specific mechanisms for the sensitivity to no, EL and 
Pare till under investigation, we remark onthe folowing. First, the 
monolayer conductance a4 Kin Fig. le, which we know is due to edge 
Conduction because this isthe established quantum spin Hall egime, 
isalmost independent of Second, in bilayers at large positive or 
negative na the reversal of Phas a similar effect on the conductance 
to that of changing E, changing it by approximately 0.15 V nm at 
7K (indicated by the dotted horizontal line in Fig. 3). This change in 
£, corresponds toa change in the electrostatic potential diference 
between the two W's layers by about 100 mV. This sof the same order 
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and schematic cross-section (right) ofa bilayer WT, device (B2) with 
‘maltiply contacted graphene in place ofthe top gate, indicating separately 
the electric fields in the -BN above (Ki) and below (£,) the Wes Scale 
bur, 5 um. b, The graphene conductance Gy is measured when a bias V, 
is applied tothe bottom gate with the intervening W'Te, grounded, ata 
serles of temperatures (as labelled). The two conductance states seen for 
the two sweep directions (black arrows) are associated with diferent out 
‘of plane polarization states ofthe W'e> (red and green arrows asin Fig). 
¢, The behaviour of G,, (they axis isthe sameas in) when a voltage 
Vis applied directly to the We, provides a mapping tothe difference 
86; =8Vidin E, between the two states d, Sketch indicating how the 
reversal of the polarization changes the eletrastatic potential (irom red 

to green) and & (ee text), Temperature dependence of 8V, which is 
proportional tothe polarization, 


as the estimated change in the potential difference associated with the 
polarization reverse, 2V~40 mY, suggesting that the potential imbal 
nce between the lyers governs the sensitivity ofthe conductance to 
both £, and P.Itisalso roughly the same asthe width of the hysteresis 
Joop: that i, the polarization flips roughly when the applied potential 
difference exceeds the potential due othe spontaneous polarization 
‘This is another indicator that electron transfer between the layers 
tay be involved, Thitd, the very sharp minimum seen in G close to 
‘=O in bilayers (Fig. 3e) presumably marks the compensation point 
at which electron and hole densities are exactly equal, suggesting that 
electron-hole correlation may be important. Taken together, the above 
observations ras the possibility that electron-hole corelation effects, 
rather than alatice instability drive the spontaneous polarization in 
‘Wey. Ifthis is thecase, then the polarization could principally involve 
a relative motion ofthe electron cloud relative othe ion cores, rather 
than altice distortion, in which case the switching would beintrin- 
sically very fast. 

Ferroelectricity adds another ingredient tothe intriguing com- 
bination of quantum spin Hall edges, correlation effects and super 
conductivity already seen in atomically thin W'Te, Although the 
quantum spin Hall behaviour and superconductvityare restricted to 


(Gofbilayer device BI at 7 Kas a function of both gate vokages, for the 
to sweep directions of Vyas indicated by the white arrows. , Difference 
between a and b at 7K, which is non-zero inthe hysteretic regime. d, Same 
‘measurement as in c but at 200 K.Ina-d, black and white dashed lines 
indicate contours of zero perpendicular field and zero charge density 
hn respectively. e, Variation in G with nat £, ~0 for both polarization 
states (up, dashed down, solid) at two temperatures (as labelled). The 
dashed bar near the bottom indicates the range of nein a-d, and is 

the critical density (see text) £, Sweeps of E, for fixed n,(as labelled) 

7 K. The dotted bar near the bottom indicates the magnitude of the 
approximate shift in E ofthe condactance minimum between the 
‘opposite polarization states, 


the centro-symmetric monolayer and ferroelectrcity occurs only for 
‘wo or more layers, itis possible that these diverse phenomena are con- 
‘nected in ways that may also be relevant to understanding the proper: 
ties that emerge inthe three-dimensional limit, including extreme and 
anisotropic magnetoresistance™=*, a polar axis and Weyl points" 
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METHODS 
Preparation and characterization of WTes devices. We measured devices with 
four diferent layouts (1) Wes with graphite gates above and below (MI, BL, Bd 
‘1; (2) bilayer WTe, with monolayer graphene as top gate (B2):(3) a bilayer 
We,/graphene heterostructure (83); and (4) a monolayer graphene device gated 
by few-layer We, (FI) Inthe fllowing, we describe fabrication of the ist type: 
the others are simile 

First, graphite and h-BN crystals were mechanically exfoliated under ambient 
conditions onto substrates consisting of285-nm thermal SiO; om highly p-doped 
sion. Graphite fakes 2-Gnum tick were chosen for the top and bottom gates and 
'5-30-nm-thickh-BN flakes (a layered electrical insulator free of trapped charges 
snd dangling bonds) were chosen forthe top and bottom dielectric. The top and 
bottom parts were prepared separately sing a polymer based dry transfer tech- 
rique™ For the bottom part, an h-BN fake ws picked up on a polymer stamp 
snd placed onthe bottom graphite. After dissolving the polymer, Pt metal contacts 
(@bout8 nm) were patterned onthe h-BN by standard e-beam lithography, -beam 
evaporation and i-of. Far the top part, the tp graphite was picked up fis, then 
the toph-BN. Both tacks were then transfered an oxygen-and water fee glove 
box. Wey crystals were exfoliated inside the glovebox and flakes irom monolayer 
totilayer thickness were optically identified and quickly picked up with the top 
ar the stack was then compete by transferring onto the lower contactsh- BN) 
graphite stack before aking out athe glvebax Finally after dissolving the polymer, 
"nother stp of e-beam lithography and metaliztion was used to define electrical 
bonding pads (Au/V) connecting tothe metal contacts and the top and bottom 
ges Estended Dat Fig. shows schematics of the fabrication proceses and optical 
And atomic force microscope (AFM) images ofa typical bilayer Wes device (B4) 
Estimate ofthe electric polarization, Weuse the fllowing spl modelo estimate 
the spontaneous polatzatin ofthe layer WT, frm the measurements in Fig 2 
‘Weasumethatd,=d, > d,wheredisthe tides ofthe WT that alleonductrs 
(Gettom graphite fats, top graphene and bilayer WT) ae rounded and have infinite 
electronic campressibliy and that the areal polarization density is associated with 
toothin sheets fare charge density +Pdsepartd by d Under these assumptions, 
‘wine the polarization reverses there sno net flow f charge betwen the conductors 
and the We; remains neutal, nd the potential profile between the gates is simply 
‘reversed when the plariation lip (Fig. 24). By Gauss 


eaFunsE = Sok + P/d co) 


inere Este electri eld inthe h-BN (equal on both sides because the bilayers 
‘neutral and Fs the field between the two charge sheets. Because the op graphene 
and the centre of the bilayer are both at ro potential, 


2d, + Bd =0 


rom equations (1) and (2 


Fa Famed 


“The change in &, when the polarization reverses is then 8#,—2E, =2P) 
[ex(24-+ cf) -With i= 10 um and d= 1 nm, the st term in the denominator 
dominates so 88. P)(ca) and thus P= uff. ==,6V. ln reality, dy and dca 
Aliferby a fctor of up to thee, the conductor hve finite compressibility and the 
polarization charge is more spread out, which taken together intrdace an extra 
‘numerical coflicint of order unity. 

‘Removing parallel (parasitic) conduction through the graphene in device B2 
Indevice B the graphene extends over regions with no Wie, underneath so that i 
‘cts asa uniform gate for the entire WT sheet. The quantity that we call, isthe 
‘salt ofthe following measurement, which maximizes sensitivity to only a central 
region of graphene above the WTe, Fest, we ground two opposing contacts othe 
{graphene and measure only the curcent that laws rom the biased contact to the 
‘ne oppesiteas shown in Fig 24 However, because of fnte contact resistance, a 
small portion ofthis current sl lows through graphene not abovethe We, Ta 
‘remove this parasitic current component, west the We voltage Vy such that 
the graphene sat its Dirac-point minimum inthe region over the W Te Because 
‘the minimum is quite broad, the graphene over the WTesisthen insensitive to Vi 
and the measured dependence on V, comes from only the parasitic component, 
‘which can then be subtracted au, Note that removing ithas an efect on the mag” 
nitude ofthe bystresie. 

In Extended Data Fig 4b, we illustrate this procedure t 220 K Frm the inset 
of Extended Data Fig. 4b, we determine thatthe graphene above the W'Tesisat its 
Disc pointat Vy —129 mv. Thered curve shows the condactance ofthe graphene 
{Gy when Vy = 129 mV the dependence onthe back gates from only the parasitic 
chntrbution. Conversely, in Fig. 2 and the blue curve in Extended Data Fig. 4b 
‘we measure Gy at Viv =0 mV at which the graphene is most sensitive to changes 
Inthe ctr feldin the top h-BN &, yet also contains the parasitic conductance 
‘The difference between these two curves (at i= 129 mV and Vy =0 mV) is 
shown in back. The hysteresis remains, whereas the 'V' shapes mostly removed. 
‘The remaining small slope can be explained bythe finitenes of the electronic 
compresibity ofthe bilayer Wes 

‘Using Extended Data Fig 4b we an estimate the ratio ofthe parasitic current 
to that flowing shave the W',. The area with no WTe, has ah-BN thickness of 

3am between the graphene and bottom gate. The red curve (with a 
Vi dependence) hasa maximum slope of dG, /4V,=17 SV"! or dGy/ 
2,560 SV"! nm after taking into account the h-BN thickness, From the inet 
curve, using voltage Vi applied to the W'Tes fr gating (with 8 am h-BN) gives 
8G f= 12,3008 Vm Thus, the parasitic component is aly about 5% of 
thetotal current. 
Data availability. The data presented i this paper and that support the findings 
ofthis study ate availabe from the corresponding author on reasonable request. 
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Extended Data Fig. 1 | Bilayer WTe, device. a Essential steps in device 
fabrication. b, Optical image of device BA. The red dashed line outlines 
the bilayer flake. Scale bar, 5 ym. e, APM topography image of the central, 


region in, Scale bar, 2 jm. d, Line cut along the white dashed linen e, 
The step height matches the expected bilayer thickness, about 4 nm. 


Graphene 


2| Thick We, used asa gat 
device FI in which thick (8 nm) We; lake un 
8a gate fora top graphene sheet. Scale bar, 10 un. b, Schematic crass. 
section ofthe device. c, Two-terminal conductance @of the graphene 


0 
Vg (V) 
2 fetion of voltage V applied othe Wes lake. Theres no sgn of 
Switching or bistability at any temperature, indicating that no polarization 
reversal necro the We surface or ids of pt 

Inset, close-ups ofthe graphene Dirac point at 4 and 300 K 
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Extended Data Fig. 3 | Switching ofan additonal bilayer device. for device Bb Conductance iference AG between the two sweep 


a, Conductance G versus perpendicular electric field Fat temperatures directions of Vs at 200 K, ae plated in Fig. 3d for device BL 
from 4 K 10 300 K and a gate doping level of n,——4 « 10" cm 
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Extended Data Fig. 4| Additional transport measurements and removal 
of parasitic effects in the polarization measurements a, Conductance 
versus Vi for the bilayer Wes in device B2, measured with the top 
sraphene grounded, The hysteresis occurs in exactly the same range of E, 
8 itdoes in the graphene conductance in Fig. 2, Note that both n_ and 

E, change when Vo is swept. The inset shows a schematic configuration 
ofthe measurement. b, Graphene conductance Gy, at 220 Kas a function 
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ra) 

‘of Vo with the voltage V on the bilayer W'Tes at 0 mV (blue) and 129 mV 
(red). The black curve isthe difference between the blue and red curves. 
‘This subtraction removes most af the V, dependence ofthe parasitic 
‘current that flows through the top graphene, which i not screened from 
the bottom gate bythe We, Inet, graphene conductance showing the 
‘minimum at Vy =129 mV. 
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ig.5 | Graphene/bilayer W'Tes heterostructure shoving both 5 K and room temperature (300 X), implying thatthe polarization of 


hysteresis up to room temperature. a,b, Device image (a)and schematic’ the We isstill present in this hybrid structure. 


cross-section (b)c, The two-terminal conductance G shows bstabil 
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Extended Data Fig. 6 | Length-dependent ferroelectric behaviour in 
trilayer Wes for temperatures from 2 K to 300 K. All measurements are 
performed at V, 0 in two-terminal configurations, where the contact, 
Separation ranges from 200 nm to 1490 nm. For all devices mentioned 
bove and in the main text, the contacts are separated by 1-2 jm, 
However, ifwe reduce the contact separation toa few hundred nanometers 


(270mm), the metal contacts prevent the polarization from switching, For 
{contact separation (L) of more than 480 nm, the transfer characteristics 
show similar hysteric behaviour asin Fig, le, d and Extended Data Fig. 3a 
Because Vis always grounded, £ and n_ change simultaneously as we 
sweep Ve 
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Extended Data Table 1 | Thickness of h-BN dielectrics and corresponding areal capacitances for the WTe, devices 


Device WTe2 top hBN | bottom hBN CQ G 
label (nm) (nm) (1103 F/m?) |_(1x10° F/m?) 
MI monolayer 6 28 5.9 13 
Bl bilayer 12 20 3.0 18 
B2 bilayer 8 25 44 14 
B3 bilayer NA 24 NA 15 
B4 bilayer 10 21 ER; 17 
Tl trilayer 5.5 23 6.4 1.1% 
Fl 8nm 24 NA LS NA 
ie te te gteneced ey blancs oe r= (GM ean E(k + Cana wr egomarc capac Cian Gan ate 


Slr cantante nnd ra asthe chen ep el aay he espe A thease wae alates ay APM rape neve, eyo eee 
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Room-temperature electrical control of exciton flux 
in a van der Waals heterostructure 


Dmitrif Unuchele', Alberto Ciarrocchi'*4, Ahmet Avsar'?, Kenji Watanabe’, Takashi Taniguchi? & Andras Kis!* 


Devices that rely on the manipulation of excitons—bound pairs 
of electrons and holes—hold great promise for realizing efficient 
interconnects between optical data transmission and electrical 
processing systems. Although exciton-based transistor actions 
have been demonstrated successfully in bulk semiconductor- 
based coupled quantum wells!*, the low temperature required 
for their operation limits their practical application, The recent 
‘emergence of two-dimensional semiconductors with large exciton 
binding energies'® may lead to excitonic devices and circuits that 
‘operate at room temperature. Whereas individual two-dimensional 
‘materials have short exciton diffusion lengths, the spatial separation 
of electrons and holes in different layers in heterostructures could 
help to overcome this limitation and enable room-temperature 
‘operation of mesoscale devices*, Here we report excitonic devices 
‘made of MoS;-WSe; van der Waals heterostructures encapsulated 
in hexagonal boron nitride that demonstrate electrically controlled 
transistor actions at room temperature. The long-lived nature of 
the interlayer excitons in our device results in them diffusing over 
a distance of five micrometres. Within our device, we further 
demonstrate the ability to manipulate exciton dynamics by creating 


'ype-ll band alignment in the WSe:-MoS: 
heterostructure with intralayer(X) and interlayer (X) excitons. The 

red and blue areas represent the bands in the two materials and the 
heterobilayer. Positive and negative symbols indicate holes and electrons, 
respectively b, Schematic depiction of the WSe:-MoS; heterostructure, 
showing the heterobilayer encapsulated in hexagonal boron nitride (h-BN) 
‘nd the top and hottom gates, The interlayer exciton has a permanent 


clectrically reconfigurable confining and repulsive potentials for 
the exciton flux. Our results make a strong case for integrating 
two-dimensional materials in future excitonic devices to enable 
operation at room temperature. 

Solid-state devices use particles and their quantum numbers for 
their operation, with electronics being the ubiquitous example 
"The need to improve power efficiency of charge-hased devices and 
circuits is motivating research into new devices that would rely on 
other principles. Candidates so far include spintronics and photonics", 
Excitons—electrcally neutral quasi-particles formed by bound elec: 
tons and holes—can also be manipulated in solid-state systems. The 
development of such excitonic devices has so far been hindered by 
the absence ofa suitable system that would enable room-temperature 
‘manipulation of excitons, limiting the expansion othe field, Here, we 
demonstrate room-temperature excitonic devices based on atomically 
thin semiconductors. These devices could open the way for wider studies 
and applications of exciton devices in the academic and industrial 
sectors". Many applications can be envisaged, because excitons could 
be used to efficiently couple optical data transmission and electronic 
processing systems, Although fast optical switches have alzeady been 


out-of-plane dipole moment p that allows manipulation via the electric 
field Bc, False-colour optical image ofthe device, highlighting the 
different materials. de, Spatial maps of photoluminescence at 670 nm 
(4) and 750-nm (e), corresponding to MoS, and WSe;intralayer 
excitonic resonances, respectively, Photoluminescence i quenched inthe 
heterostructure area owing to eflcient charge transfer. Scale bars, Sum. 
au, arbitrary units. 
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demonstrated!2", the comparably large size (about 10 um)! of| 
such devices limits packing density. This can be overcome in excitonic 
devices, the characteristic size of which is determined by that of elec 
tronic field-effect transistors (FETS) 

‘Owing to their finite binding energy &, excitons can exist upto tem: 
peratures of around T'~ Fy/ks, Where ky isthe Boltzmann constant. Ina 
conventional II-V.semiconductor coupled quantum well witha size of| 
a few nanometres, the relatively small binding energy of around 10 meV 
permits the observation of excitons only at cryogenic temperatures 
(less than 100 K)?. To reach higher temperatures, different materials 
are required. To this end, systems with higher (inthe range of tens of| 
rillilectronvolts) have been explored more recently, such as (ALGa)N/ 
GaN (tel, 9} or Zn0 (ref. "), Two-dimensional semiconductors such 
as transition-metal dichalcogenides have even larger exciton binding 
energies, which can exceed 500 meV in some cases owing to strong 
quantum confinement**. This could enable the realization of excitonic 
devices that operate at room temperature’ 

Although intralayer excitons have relatively short lifetimes (about 
10 ps)", the spatial separation of holes and electrons in interlayer 
excitons results in lifetimes more than two orders of magnitude longer, 
wel in the nanosecond range*, For the device presented here, we take 
advantage of interlayer excitons in an atomically thin MoS-WSe> 
heterostructure. Type-I! band alignment2"“! (Fig. 1a) results in charge 
separation between the constituent materials, with electrons and holes 
residing in MoS, and WSe,, respectively. The formation of indirect 
excitons is marked by the appearance of a new photoluminescence 
emission peak”, redshifted by about 75 meV with respect othe intra 
layer exciton of the WSe; monolayer. In Extended Data Fig. 1b we 
present a typical photoluminescence spectrum obtained from such 
heterostructure on SiO, in which the spectral signature ofthe inter- 
layer exciton i clearly visible (dark blue line), together with the indi 
Vidual WSe> and MoS monolayers (blue and red lines, respectively) 
Recent reports” suggest that excitons in the MoS;-WSe, system are not 
only spatially indirect, but also momentum- indirect owing to lattice 
‘mismatch. The phonon-assisted nature ofthe emission process further 
reduces the exciton recombination rate, yielding a longer lifetime® 
Such an extended lifetime can be used to obtain interlayer exciton dif- 
fusion over a scale of micrometres, even at room temperature 

To obtain a pristine surface, the heterostructure is encapsulated 
in hexagonal boron nitride and annealed in high vacuum. Multiple 
transparent top gates are fabricated out of few-layer graphene. This 
double-gate configuration allows us to apply a vertical electri field 
without changing the carrier concentration in the MoS:-WSe> 
heterostructure. In Fig. 1¢ we show a false-colout optical micrograph 
ofthe resulting stack. We characterize the structure by using photo 
luminescence mapping at oom temperature, under 647-nm excitation. 
In Fig. 1d, and Extended Data Fig. 1 we show the intralayer emission 
distribution atthe wavelengths characteristic of MoS, (670 nm), WSes 
(760 nem) and the interlayer exciton (785 nm). Wheteas individual 
‘monolayers appear to be homogeneously bright, emission from the 
heterostructure region is uniformly quenched by more than three 
orders of magnitude, owing to the efficient charge transfer between 
layers*. Even with this strong quenching, we are able to detect the 
interlayer peak in the photoluminescence spectra (Extended Data 
Fig. 2), confirming the generation of interlayer excitons. Because this 
efiect has a central role in our work, we fabricated three more hetero- 
structures encapsulated in hexagonal boron nitride, confirming the 
reproducibility of tis result (Extended Data Fig. 3). 

Given that excitons do not carry a net electric charge, we do not 
expect their flow to be influenced by the direct application of an 
in-plane electric field. However, the confinement of oppositely charged 
carriers in different layers results in a well-defined interlayer-exciton 
dipole moment p with an out-of-plane (2) direction (Fig. 1b). Anelec 
tric field E.(c,y) perpendicular to the crystal plane can then be used 
to shift the exciton energy by ff = —p-E:, while its lateral modulation 
drives the exciton motion towards regions of lower energy. Exciton 
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Fig. 2 | Excitonic transistor operation at room temperature. a, The 
application of gate voltages (Vg Viz. Vj) to transparent graphene 
tlectrodes (gates 1-3) can engineer a potential landscape lor the diffusion 
ffexcitons, controlling their flux through the device. be Calculated 
energy variation BE for the excitons inthe ON (free diffusion; b) and 
OFF (potential barrier; ) states. Red arrows represent laser excitation 
the bound charges and black dashed arrowes denote the excitons and their 
diffusion, respectively. de, Corresponding images of exciton emission 
Dashed lines indicate the positions ofthe diferent layers that form 

the heterostructure and the top graphene gate (gate 1). The laser spot 

is represented by the red circle. Colour scale indicates the normalized 
photoluminescence intensity. Scale bars, jm. f, Gate dependence of the 
ON/OFF ratio for optical excitation 3 um away from the emission centre 
(left axis) The right axis shows the reference data which were acquired 
with the incident laser beam located directly on the emission centre 
(input-output distance, di =0ym). The measured emission intensity 
is normalized by the OFF-state value at Vis = 15 V. The background 
shading indicates the ON (red) and OFF (grey) states. The blue dashed 
line represents the gate voltage at which the barrier height is equal tothe 
thermal energy 
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Fig. | Biasing ofthe excitonic device, a,b, Calculated eneegy profile NE 
ofthe indirect exciton asa function of lateral coordinate X forthe forward 
(@) and backward (b) bias cases. The black solid line indicates the direction 
ofexciton drift. c, Image showing exciton emission from the device when 
Injecting ata distanced, ,~ Sm from the emission area. Colour scale, 
ddashed lines and red circles asin Fig 24, Scale bar, Sum, d, Normalized 


dynamics n the longitudinal direction can be modelled by a diffusion 
equation wth an external potential (see Methods): 
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‘where 1, D and r ae the interlayer-exciton concentration, diffusion 
coefficient and lifetime, respectively, vis the exciton potential (including 
the electrostatic contribution y= ~p) and Gis the optical generation 
:ate, Tis simple model qualitatively shows how the application of an 
electric field E. can affect interlayer exciton diffusion, as we discuss 
later. 

We first demonstrate an electrically controlled excitonic switch, 
represented schematically in Fig. 2a, Laser light focused inside the 
heterostructure area (input) generates interlayer excitons, which diffuse 
along the channel ofthe heterostructure. However, the low brightness 
of interlayer emission makes monitoring the operation of the device 
challenging. For this reason, we use the exposed WSe, that extends 
out ofthe heterostructure asa bright emitter. Here, interlayer excitons 
diffuse towards the edge ofthe heterostructure. During this diffusion 
process, interlayer excitons are expected to dissociate into single car 
riers, which are alowed to diffuse inside monolayer MoS» and WSes, 
where they experience recombination with native charges, resulting 
in bright emission. The emitted radiation is recorded simultaneously 
using a charge-coupled device (CCD) camera and a spectrometer 
(see Methods), to obtain spatial and spectral emission profiles. This 
allows us to further confirm the presence and diffusion of interlayer 
excitons inside the heterobilayer (Extended Data Fig 2)-In the absence 
of applied fields (Fig. 2b), excitons diffuse away from the pumping 
area (red circle in Fig. 2d), owing to temperature and concentration 
gradients", and reach the recombination site, approximately 3 um 
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tum) 
‘output intensity asa function ofthe distance d,, between optical injection 
and the emission point, for the forward (red) and backward (blue) bias 
configurations, compared to the unbiased case (grey). The grey shading 
indicates the nize floor, Exciton diffusion over a distance of 5.5ya is 
achieved, 


way. Comparison of pumping and emission profiles (Extended Data 
Fig. 4 lets us exclude the possibility ofa direct excitation of monolayer 
‘We, by the low-intensity tal ofthe laser spot. This situation (bright 
‘output is shown in the emission image in Fig. 2d and corresponds to 
the ON state ofthe excitonic transistor. On the contrary by introducing 
‘a potential barrier higher than kyon the path of the diffusing excitons 
(Fig. 2c), we impede their motion, resulting in the suppression of light 
emission (Fig. 2e).In this way, we can achieve efficient electrical modu- 
lation of the output emission, as shown in Fig. 24 in which the emission 
intensity (normalized by the value in the OFF state, corresponding to 

= +16 V) isplotted asa function of applied voltage. For reference, 
we also plot the intensity modulation observed when the laser beam 
is located on the emission centre (input-output distance d= Om). 
‘The switching threshold is around 8 V, which corresponds well with 
the calculated exciton energy modulation of bf ~ kaT = 25 meV (blue 
dashed line in Fig. 2. This results consistent with our model: because 
the height ofthe energy barrier starts to become comparable to thermal 
excitation, it is now possible to block the diffusion of exciton flux. We 
extract an intensity ON/OFF ratio larger than 100, limited by the noise 
level ofthe set-up in the OFF state (see also Extended Data Figs. 4,3). 
Such a high ratio results from the realization of an excitonic transistor 
With complete suppression of emission in the OFF state. This effect is 
also clearly visible in the spectrum ofthe emitted light, in which the 
'WSe, peak is selectively suppressed when the device isin the OFF state 
(Extended Data Fig. 6). We also note that strong emission from MoS, 
is detected in both states, because excitons can diffuse freely in other 
directions. 

‘Analternative mechanism that could in principle explain the recom- 
bination far away from the excitation spot is based on the diffusion 
of single carriers rather than interlayer excitons. It has been shown 
that such carriers (holes in particular) can have long lifetimes". 
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Fig. | Electrically reconfigurable energy landscape. a-c, Calculated 
energy profile 8 ofthe indirect exciton for the cases ofa potential well 
(a), free diffusion (b) anda potential barrier (e)-d-f, Imaging of exciton 
emission forthe configarationsshoven in a-c. Incident laser light (red 
circle) is focused on top of gate 2, Dashed lines indicate postions of 


However, experimental observations indicate that this is not the dom. 
inant mechanism in our heterostructure. First, we observe the pro 
duction of interlayer excitons directly in the excitation area, even if 
the intensity is low. Second, for a flux of single carriers, the voltage 
‘modulation necessary to counteract thermal excitation and block the 
single-particle flux would be about 50 mY, more than two orders of 
magnitude lower thatthe gate voltage of approximately 8 V required in 
our experimental result shown in Fig 2. Finally, this mechanism would 
also result in different emission profiles for different regimes of device 
operation (see Extended Data Fig. 7) 

To exclude the possibility that the observed effect arises from an 
‘unwanted modulation ofthe charge carier density in WSea, we perform 
a calibration experiment in which the excitation ight focused on the out 
ptarea (d,.=0) and the device s biased as befor. Ths reference experi 
‘mentis discussed in detail in Methods and the results presented in Fig. 2F 
(grey curve: it shows that only a comparably small modulation of WSe 
emission intensity is observed. This confirms tat the energy barier isthe 
origin ofthe switching behaviour Westudy the dependence ofthe ON/OFF 
ratio on d,. further (Extended Data Fig. 8) by keeping the voltage profile 
constant and optically injecting excitons at different distances from the 
‘outpat point. Consistent with our model, we observe efficient modulation 
when the lasers focused beyond the energy barter, with emission inten 
sity decreasing with increasing. owing to long-distance dfusion. The 
diffusion length can be doubled a lower temperature 47 K), resulting in 
operation over a longer distance Extended Data Fig 9). 

Having demonstrated that we can block or allow spontaneous 
exciton diffusion, we go further by creating a drift field inthe desired 
direction, in analogy with the source-drain bias of a conventional 
FET. We show this type of operation in Fig. 3, with all three electrodes 


ifferent layers that form the heterostructure andthe graphene top gate 
2 colour scale asin Fig. 2. Scale bars, Sum. g-i, Cross-section of the 
intensity profile along the device channel, integrated aver its width, for the 
three configurations The red-shaded underlay represents the profile f the 
excitation laser. 


used to create a potential ladder going upwards or downwards 
with respect to the excitation point (Fig. 3a, b). When excitons 
encounter a gradually decreasing energy profile (forward bias), their 
diffusion is enhanced by a drift term, allowing us to operate the device 
with a larger distance between optical input and output. As shown 
in Fig. 3¢, this regime of electrically assisted diffusion can result in 
exciton transport over a distance of 5 um. To obtain a more quantita 
tive estimate ofthe induced modulation, we measure the dependence 
of the emission intensity on the distance from the laser spot as itis 
displaced away from the output area at fixed gate voltages. The results 
(Fig. 3d) show that the length over which excitons diffuse can be 
effectively modulated from 5.5m to 34m, compared to about 4.5m 
inthe unbiased case. The modulation ofthe effective diffusion length 
with the potential 2s qualitatively follows the model introduced in 
equation (1) 

‘We further use the multi-gate configuration to demonstrate more 
complex and electrically reconfigurable types of potential landscape 
and related device operation. In Fig. 4a-c we present the energy profiles 
calculated for fre diffusion (Fig. 4b) compared with a potential well 
(Fig. 4a) and a repulsive barrier (Fig. 4c) produced by the central gate 
(gate 2), while the side gates (1 and 3) are kept grounded. In this case, 
the position of the optical pump is centred on the middle electrode, 
which corresponds to the centre of the well or barrie. In Fig. 4d,g we 
show the CCD camera image and related emitted intensity profile along 
the device channel for the case of the potential well. We observe photo: 
luminescence emission only from the narrow area below the central 
contact, which is indicative of electrical confinement of the excitonic 
cloud, Conversely, when applying a positive voltage to create apotential 
hill (Fig 4,3), we see an expulsion of excitons from the pumping area 


LETTER 


‘ith the appearance of bright emission spots outside the middle section 
of the device, owing to excitons drifting along the energy profile and 
recombining on the edges ofthe heterostructuce. This i evident from 
a comparison with the free-diffusion case in Fig, 4e,h. Interestingly, 


also observe higher-energy emission from the neighbouring MoS; 


‘monolayer parts inside the wel in the case of exciton confinement, A 
similar effect is also observed during exciton expulsion, with bright 
spots appearing atthe edges of the heterostructure around the repulsive 
potential. Further inspection of the emission spectra from Fig. 4d, 
confirms this, with the intensity of monolayer peaks decreasing 
(increasing) when confining (anti-confining) the excitons (Extended 
Data Fig. 6). As also discussed in Methods, the observed MoS, emis- 
sion is affected by the local inhomogeneity ofthe substrate and by the 
‘optical filters used. As discussed earlier, the diffusion of single particles 
and their recombination with native charges that are available in the 
‘monolayers could have a role in light emission that extends from the 
edges of the heterobilayer into the monolayers, 


Online content 
Any Methods, including any statements of data availability and Nature Research 
‘eportng summaries, along with any additional references and Source Data fies, 
are avallabein the online version ofthe paper at rtps//doLorg/101038/41586- 
n1s03s7-. 
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METHODS 
Device fabrication, The heterostructure was fabricated using polymer-asssted 
teansfer (see Extended Data Fig 10) of fakes of hexagonal boron nitride (h-BN), 
We: (HQ Graphene) and MoS, (SPD). Flakes were fist exfoliated on a poly 
rer double layer, as described previously". Once monolayers were optically 
identified, the bottom layer was dissolved witha solvent and free-floating filme 
With fakes were obtained. These were transferred using a custom-built set-up 
‘with micromanipulators to carefully alga lakes on tp ofeach other. During 
the transfer process, the sharp edges ofthe Nakes were aligned to obtain a twist 
angle between the two crystal aes close to 0° (or 60"). However, inthe case of 
‘MoS,-WSe; heterobiayer, the alignment has been shown tobe not critical for 
the observation of interlayer excitons?" Thisiedueto the indirect (in reciprocal 
‘space) nature of the transition and tothe considerable lattice mismatch between 
thetwo layers about). Polymer residue was removed witha hot acetone bath 
(Once completed, the stack was thermally annealed in high vacuumat 10-* mbar 
for 6h. Fewlayer graphene flakes were obtained by exfoliation trom graphite 
(NGS) on 5U5i03 substrates and pattered inthe desired shape by electron-beam 
lithography and oxygen plasma etching. After thermal annealing, the patterned 
flakes were transferred on top ofthe van der Waal stack using a polymer-assted 
transfer and the entire structure was annealed again in high vacuum. Finally, 
slectrcl contacts were fabricated by electeon-besm ithography and metallization 
(60 am/2nm Au/T. 

Optical measurements, All messurements presented here were performed in 
‘vacuum at room temperature unless specified otherwise. Excitons were optically 
[pumped bya continuous-wave 617-nm laser diode focused tn the diffraction knit 
With abeam sizeof about lum. The incident power was 250). The spectral 
and spatial characteristics of the device emission were analysed simultaneously. 
‘The emitted light was acquited using aspectometer (Andor) and the laser ine 
was removed witha long-pars650-nm edge fer. For spatial imaging we used a 
long pass 700-am edge ierso thatthe laser ight and most fthe Mo emission 
were blocked Filtered light was acquired by a CCD camera (Andor Ixon) The 
room-temperature photoluminescence spectrum of MoS,shown in Extended Data 
Fig. Ib was obtained under 10-yW excitation at 617 nm, whereas monolayer WSe: 
and the heterostructure fabricated on SiO, substrate were characterized under 
488-nm excitation. 

‘Owing othe small separation between the interlayer and the intralayer WSes 
exciton peaks, is not possible to completely distinguish them inthe images 
‘cquited on the CCD. The tal f the WSe; monolayer peak normally overlaps 
with the spectral line ofthe interlayer exciton considerably, meaning that weak 
luminescence round 785 nm can be observed even on monolayer WSe, (Extended 
Dita Fig.) which i not due to interlayer excitons 

‘Because of the use ofthe 700-nm ler, the emission from monolayer MoS: is 
In principle not observable on the CCD. However, some lght canbe transmitted 
‘when the rosdening ofthe photolaminescence peak results ina low-energy tal 
(sce Extended Data Fig. 1) extending beyond 700 nm, Local inhomogeneity inthe 
substrate can affect this broadening, which could explain why the observed MoS: 
luminescence in Fig. comes mostly from thelet part ofthe device 

Lovs-temperature measurements (Extended Data Fig. 9) were performed in a 
liguid-elium, continuous low cryostat (Oxford Instruments). 

Reference experiment, We performed a reference experiment to exclude spurious 
eflctsthat could compromise the interpretation ofthe data. Fis, we observed how 
the photaluminescence emission from monolayer WSeschanges when gating the 
device using the back gate For this purpose. we excited the exposed WSe, with the 
laser beam directly and recorded the photoluminescence spectra, When applying 
voltage tothe back gate, modulation in the emission intensity is clearly observ 
able (Extended Data Fig. 12a) We repeated the same measurement, but instead 
Df applying a voltage between the lake and the back gate, we biased the top and 
back gates thus generating vertical electric field inside the device-Tathiscase, we 
cannot observe any substantial chang in the emission intensity (Extended Data 
Fig. 12b). This allows us to rule out the possibility thatthe switching action that 
we observe could be diet suppression of photoluminescence from a changing 
doping evel in the material. 

Image processing To aid the interpretation of images from the CCD camera, 
we performed several image-processing steps using Image, We fist subtracted 
from the original image abackground image obtained without laser lumination, 
to account for ambient ight nose. In some cates, a simple background was not 
sufficient compensator the presence of puri signal from unwanted elec 
tions or changing ambient background. In these cass, background image was 
generated by applying the rolling ball algorithnn in Image]. Contrast was adjusted 


LETTER 


to caver the range of values inthe image, We provide an example of the procedure 
in Extended Data Fig. 13, 

“Modelling exciton diffusion. The dynamics ofthe exciton in the channel of our 
device can be modelled by one-dimensional difision in the presence ofan external 
Potential (x) (temperature electrostatic potential or dipole-dipole interaction) 
"The gradient of exciton concentration n(x) drives diffusion curtent yy while the 
potential gradient causes drift 


Where isthe exciton mobility, which is elated tothe difusion coefficient Dand 
the thermal energy ksFby the Einstein relation D= ky We also include an 
exciton generation rate G by means of optical pumping and an exciton recombi- 
ation rate, which relate to the exciton lifetime as & nr, From the exciton 
continuity equation we then obtain equation (1). 

nour system, in which excitons havea builtin vertical pele moment p, the 
electrostatic potential induced by the vertical electric Held is >= Fp. Because 
‘We se continuous-wave excitation, we assume a steady-state case (nt —0). 
Considering a the main contribution to exciton deft, we obtain 


tn DE a 
pain 2,26. 


De Kat Ox" Ox 


‘We simplify the model farther by assuming two fundamentally diferent regions, 
shown in Extended Data Fig. 4. Firs region is under constant homogeneous 
txctation s0 thatthe concentration reaches an equilibrium value with equal 
recombination and generation rates (R-+ =O). The equlibrium concentration 
isthen ny =Gr. Outside ofthe pumping region, excitons diffuse away, diven by 
the concentration and potential gradients: 


an Dp a 
Ie TT 


i) 
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“The case of fasion in the absence of an external field canbe solved analytically, 
revealing exponential decay of exciton density from the pumping region with @ 
tic distance that corresponds tothe difsion lengthy 
“an applied non-homogencous vertical electric field can alter the diffusion 
length (as demonstrated experimentally), which can be modeled as a change in 
the effective dtfsion length. 
[Numerical simulation ofthe excton-energy profil, Wefirst calculate the electeic- 
field distribution in our system using the COMSOL Multiphysics simulation 
softrare, All calculations were perfornsed considering the dimensions ofthe device 
a follows: the top graphene gates ate Lym wide and spaced 08am apart. The 
heterostructure isescapsulated beeween two h-BN crystals (10 nm thick onthe top 
and20 am on the bottom), and the substrates heavily dope Si with 270 nm of 
‘10; 0n top (see Extended Data Fig 15a) Extended Data ig 15h showsan example 
ofthe electrical eld in the system inthe confinement configuration, with -10V 
applied to the central gate and the sde gates grounded. Intelayer excitons havea 
bull-in out-of-plane dipole moment direced upwards, with an absolute value of 
% 7.3% 10-mahere isthe elementary charge and d-—7-5 Ais the 
jon in our heterostructure, They thus experience an energy shift of 
‘pin the presence of «vertical electric ld E. The resulting force applied 
‘on the exciton in the longitudinal diectinn is proportional tothe rst derivative of 
the vertical electric eld E, with espect tothe channel x axis 
oe, 


_ a1) _ Pe 
eae 


Example profiles ofthe confinem 
DataFig. Le. 

Data availablity. The data that support the ndings ofthis study are avaiable 
from the corresponding author on reasonable request 


-well configuration ae shown in Extended. 


30, Mayoroy, Seta. Mcramstarscale ballistic transport in encapeulated 
‘graphene atronm iemperature Nan Let 11, 2396-2399 (2011), 

31, Zhu, Hota: Itertacial charge vansercreurventing mormentum mismatch at 
‘yo-dimensional van dar Waals heterojuntions. Nano Let 7, 3931-3598 
017). 

32, Semnetder © A, Rasband, WS & lcs KW NH Image to Image: 25 years 
cofimage analysis. Net Metres 9, 671-875 2012} 
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Extended Data Fig | Interlayerexcitonsin the WSe,-MoS, van der _b, An iicient interlayer charge-transfer process the heterostructure 
Waals heterostructure a, Spatial map of photolaminescence ¢785.am _encapslated in b-BN results in farther quenching of photolaminescence 
corresponding othe heterostructure interayer photoluminescence emission from th heterostrcture. Seale bar, um. b,Photoluminescence 
mission maximum, ae shown in the photoluminescence (PL) spectrain spectra from the sractre fabecated on SiO. 
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Extended Data Fig. 2 Spectra of excitonic device emission. «mission, We note thatthe low-energy peak (X;) cannot e related to 

a, Distribution of photoluminescence emission intensity from the device, localized excitons in WSe,, because they are observed only a cryogenic 

i the absence of an electric field. White dashed lines represent edges of | temperatures, Fullpectrum of the emission shown ina also showing 

constituent crystals. Scale bar, Spm. , Detailed spectrum ofthe emission the emission from MoS: (Xs""), which is blocked bythe filter in the CCD 

Pattern, showing the interlayer exciton peak (X;) and WSeyinteaayer mage. The black dashed box indicates the range of energies shown in b 
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3 | Characterization of an additional WSe:-MoS: __(e), corresponding to MoS: ntralayer (y), WSes intralayer (y) and 

alse-colour optical image ofthe fabricated heterostructure interlayer (X) excitonic resonances. Photoluminescence 
stack-b, Atomic force microscopy (AFM) height-profile image ofthe is quenched in the heterostructure area owing to efficient charge transfer, 
heterostructure. c-e, Spatial maps of photoluminescence intensity White dashed lines represent edges of constituent crystals. Seale bars, Sum. 
at emission wavelengths (A) of 670 nm (c), 750 nm (d) and 785 nm 
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Extended Data Fig. 4 | Exton transistor input and output. yost- line represents the intensity profi of he ser spot, e CCD image of 
scclinal profi ofthe device emision intent long the whitedashed_thecxctomemssion tthe ON sate () and foe facaed later po) 


lines in band c, obtained for different gate voltages Vz rom 0 V (light 


‘The lengths of the dashed lines indicate 10 um, 
blue) to 16 V (black) with intermediate values of 4 V,6 V and 8 V."The red 


LETTER 


Extended Data Fig. 5 | Switching ofthe excitonic transistor. a-f, CCD images of exciton emission from the device, obtained for different gate voltages 
Vs from B10 10 Vin steps of 2V 


OFF (barrier) ON (free diffusion) 
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Extended Data Fig. 6 | Spectrum of light emitted from the device in 
different states. a,b, Intensity distribution of light emission from the 
excitonic transistor in the OFF and ON stats (left and righ, respectively 
a) and the corresponding spectra collected from the entre device (black 


TS reo Tes 170175 T8088 
Energy (eV) 
ani ee, respectively bd latent dxtibation of ight emission Fons 
the excitonic device inthe confinement and expulsion configurations et 
and ght, respectively) andthe corresponding spect colectd rom 
the entre device (bic and ved respectively: d) 
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Single-particle transport hypothesis 
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Extended Data Fig.7 | Schematic depiction of the control over the light 
emission. a, b, Energy profiles for electrons (red) and holes (blue) inthe 
device when applying negative (a) or positive (b) voltage on the central 
gate (V:)-€.d, Corresponding expected emission images under the single 
particle assumption ef, Energy profiles of interlayer excitons (ILE) in 

the presence of an external electric field, under the same conditions asin 
sand b-g.h, Corresponding experimental results. Scale bar, 5m. The 
figures in a-d are schematics based on the hypothesis that after the fast 
interlayer charge transfer, photo-excited carriers move independently 
rather than being bound in interlayer excitons. The diffusion of single 
clectrons and holes is then subject tothe type-I band alignment between 
‘MoS: and WSes, which restricts the motion of electrons to MoS: and holes 
to We:. This charge separation i very efficient, as indicated by the strong 
suppression of intalayer emission om the heterostructure (Fig, le) 
(nce the separation accurs tis not very likely thatthe charges can 

hop between the layers: the band difference between MoS, and WSe, is 
sore than 200 meV, so thermal excitation of 25 meV wil not he enough 
for electrons to jump back to WSe: and holes to jump back to Me 

“Another thing to consider isthe local electrostatic potential defined by 


Experiment 
ILE} 


the gate. The application of Vi <0 creates a confining energy profile for 
single holes anda repulsive one for single electrons, as in a and c. Holes 
would then be confined inthe WSe area under the gate while electrons 
‘would be pushed out to MoS; areas next to the gate, where they would 
recombine with charges already present in the monolayer area resulting 
in photolaminescence from single-layer areas of MoS: next to the gate 
(provided that there are enough holes in MoS; to start with), We would 
then expect to abtain the emission pattern shown in c, assuming the 
presence of native holes in MoS,, In their absence, we would expect to sce 
‘nly one emission spot, coinciding withthe excitation lace spot. Along 
the same lines, applying positive gate voltage tothe midale gate (Vy 
would result in a repulsive potential for holes in WSe, and an attractive 
tne for electrons in Mos,. Recombination would then occur for electrons 
in MoS, in regions under the gate and for holes in WSe, in regions outside 
the gate, as shown ind. This isin contradiction withthe experimental 
observations in e-b. In the case of interlayer exciton transport, we instead. 
have only a single energy profile (e,g), and the application ofa postive 
te results in the expulsion of interlayer excitons 
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Extended Data Fig. | Excitonic transistor characterization for 
different postions ofthe excitation laser spot a, Normalized emission 
Intensity (ransistr output) asa function of the distance between optical 
injection and the emission point dy which isthe same asin Fig. 3, 
shown for the ON (blue, Vj. =O V) and OFF (black, Vij = 16 V) states, 


by, Transistor efficiency calculated a the ratio between output emission in 
the ON and OFF states for different input-output separation distances d, 
Efficiency reaches a maximum when the laser spot is moved completely 
[beyond the gat, so thatthe energy barrier stays between the input and the 
‘output and thus elfectively modulates exciton diffusion, 
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Extended Data Fig. 9 | Characterization ofthe device at low b, Emission images ofthe device in the ON (top) and OFF (down) 
temperatures. a, Normalized output intensity a8 function of the slates when measured at 4.7 K, with input-output separations as long as 
distance between optical injection and emission points, obtained at room d,g— 5.1 ym. Such long-distance transistor switching was nol observed at 
temperature (red, 300 K)and.7 K (blue). No electric fields applied room temperature for this sample. 
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Extended Data Fig. 10| Heterostructure fabrication. Optical images _pre-patterned few-layer graphene stripes (Cr); and f, metallization of 
taken daring different fabrication steps:a, exfoliation ofthe bottom h-BN Au/Ti contacts, The image in eis shown in black and white for beter 
(b-hBN):b, transfer ofa monolayer MoS; flake ¢ transfer ofa monolayer visibility ofthe final structure Scale bar, 10ym (applies to all images). 
We; flake d, encapsulation with top h-BN (t-hBN);, transfer of 
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Extended Data Fig. 11 | Variation in photoluminescence emission 
from MoS; duc tothe inhomogeneity ofthe substrate. a, Image of 
photoluminescence emission coming fom the device inthe repulsive 
configuration shoven in Fig. fb, Micro-photoluminescence (uPL) spectra 
from the areas marked by red and blue ctcles in a, showing diferent peak 
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Widths asa result of local inhomogeneity inthe heterostructure. The grey 
Shaded areas the part of the spectrum cut bythe 700-nm long-pass filter. 

[As can be clearly een inthe image, areas where MoS, photoluminescence 
‘hows a low-energy tall due to broadening hecome visible to the CCD (lft 
side ofthe device), whereas the other areas appear dark (eight side) 
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Extended Data Fig. 12 | Reference experiments. a, Photoluminescence 
spectra from monolayer WSes at diferent back-gate voltages. Substantial 
‘modulation ofthe emission intensity is observed. b, Photoluminescence 
spectra from monolayer WSe, when using top and back gates inthe 


<dual-gated configuration, for the voltage range used inthe experiment 
presented in Fig 2. No appreciable intensity modulation is observed. 
Both measurements are performed on the same WSes lake with the same 
continuous-wave excitation at 647 nm and 200 .W of incident povter. 
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Extended Data Fig. 13 | Image post-processing. a, Original CCD image alter background subtraction. Original CCD image of the exciton 
ofthe exciton emission forthe configuration shown in Fig. 3a, The yellow emission fr the configuration shown in Fig. 3b, d, The same image afer 
square highlights the area of interest, shown in Fig. 3c.b, The sameimage background subtraction. Scale bars, 15m. 


LETTER 


dn 
laser in 
G 
R=-nft 
Pumping 
Diffusion 
han 
x 
Extended Data Fig. 14 Modelling ofexcitn diffusion, Schematic of together withthe recombination rate establish the exciton concentration 
tration gencratonin the pumping ares (e<0) and dffision susie, minthe pumping region. The concentration gradient outside the 
(>) eposented bythe exction eaceeninatin 3) Content oping, ptaoyng aren geneva sce Ces jag Cat dives ifn and ead 
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LETTER 


a 
10 8 
= 
$ 
os 3 
g = 
S z 
o 2 
3 
05 
X (um) 
> e ‘ 
40 a 40 
@ | 
g 
3 20 48 20 
& @ S410 es 
uf ge a 
go os = = 
3 g8 go 
‘s § 220 2 
3.20 4 me: 
3 2 wd 20 
- ‘hr x300 
40 +8 
ar a a a a ae 0. i 2 
X (um) X (um) x (um) 
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electric feld (black, eft axis) and the electrostatic potential (red, ight 
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Cryo-STEM mapping of solid-liquid interfaces and 
dendrites in lithium -metal batteries 


Michael J. Zachman’ 


Solid-liquid interfaces are important in a range of chemical, physical 
and biological processes! but are often not fully understood 
owing to the lack of high-resolution characterization methods 
that are compatible with both solid and liquid components’. For 
example, the related processes of dendritic deposition of lithium 
‘metal and the formation of solid-electrolyte interphase layers!” are 
known to be key determinants of battery safety and performance 
in high-energy-density lithium-metal batteries. But exactly what 
is involved in these two processes, which occur ata solid-liquid 
interface, has long been debated*"! because of the challenges of 
observing such interfaces directly. Here we adapt a technique that 
hhas enabled cryo-transmission electron microscopy (cryo-TEM) of 
hydrated specimens in biology—immobilization of liquids by rapid 
freezing, that is, vitrification’, By vitrifying the liquid electrolyte we 
preserve it and the structures at solid-liquid interfaces in lithium- 
‘metal batteries in their native state, and thus enable structural and 
chemical mapping of these interfaces by cryo-scanning transmission 
electron microscopy (cryo-STEM). We identify two dendrite types 
coexisting on the lithium anode, each with distinct structure and 
composition. One family of dendrites has an extended solid- 
electrolyte interphase layer, whereas the other unexpectedly consists 
of lithium hydride instead of lithium metal and may contribute 
disproportionately to loss of battery capacity. The insights into the 
formation of lithium dendrites that our work provides demonstrate 
the potential of cryogenic electron microscopy for probing nanoscale 
processes at intact solid-liquid interfaces in functional devices such 
as rechargeable batteries. 

‘Accurate high-resolution characterization of electrode-electrolyte 
interfaces is challenging owing to the volatility of commonly used 
liquid electrolytes, the high chemical reactivity of metal anodes such as 
lithium and the fact that the egion of interest isan interface between 
two condensed phases of matter. To address this, the liquid is typically 
removed and the electrode of interest washed and dried before being 
characterized using traditional methods, which alters the structure 
and chemistry ofthe solid-liquid interface". Here, we use cryogenic 
tecliniques originally designed for preserving hydrated biological spec 
imens!* coupled with cryo-focused ion beam (cryo-F1B) and analytical 
cryo-scanning transmission electron microscopy (cryo-STEM) tech 
niques to access the intact structure and chemistry of dendrites and 
ther interphases in lithium-metal batteries down to the nanoscale. 

Figure 1a shows a schematic of the symmetric lithium-metal coin 
cells used for these experiments (for additional details, see Methods) 
‘To preserve the electrolyte on the electrode surface, the cells were 
opened and the electrode immediately plange-frozen in acryogen. To 
explore the morphology ofthe anode surface rapidly, we used cryo-FIB 
to mill series of cross-sections through structures large enough to be 
localized by raised regions in the frozen electrolyte (Fig. 1b). We imaged 
each successive cros-section (Fig, 1c, d), revealing two distinct deposit 
morphologies, which we refer to as type land type II dendrites. Type 
| dendrites are roughly 5y.m across with low curvature, whereas type 
dendrite are generally hundreds of nanometres thick and tortuous. 


yhengyuan Tu, Snehashis Choudhury’, Lynden A. Archer?" & Lena F. Kourkoutis!** 


We did not observe any spatial correlations between dendrite types, 
nor cases where one dendrite type clearly formed on the other. To 
gain insight into their three-dimensional (3D) morphology, we recon 
structed the 3D structure of the dendrites from the cross-sectional 
images (Fig. 1e), as has been demonstrated previously for biological 
samples'*. The electrode contact areas for the individual structures 
can thereby be compared directly, revealing that the widths of type I 
dendrite contact areas are more than an order of magnitude smaller 
than those of type I dendrites, This suggests that type II dendrites may 
become disconnected from the electrode more easily during battery 
cycling and, in combination with their approximately equal numbers 
(Fig. 19 and volumes, contribute disproportionately to capacity fade 
owing to electrochemically disconnected (dead’) lithium. 

Although cryo-FIB techniques alone provide valuable morpho. 
logical information, we used cryo-STEM and electron-energy-loss 
spectroscopy (EELS) to obtain high-resolution structural and chemical 
information about the dendrites and their associated solid-electrolyte 
interphase (SEI) layers. Electron-transparent cross-sectional lamellae 
were extracted from plunge-frozen anode-electrolyte interfaces using 
cryo-FIBlift-out!* (Fig. 2a,, Extended Data Fig 1). High-angle annular 
dark-field (HAADE) cryo-STEM imaging immediately revealed an 
extended SEI layer on the type I dendrite approximately 300-500 nm 
thick, which was not present on the type Il dendrite (Fig. 2, d). The SEL 
layer on lithium-metal battery anodes is generally thought tobe tens of 
nanometres thick”. Our results suggest that a sot, extended portion of| 
the Elis removed by the typical washing and drying sample-preparation 
steps. The remaining SEI material observed by these techniques would 
then be a thin, compact layer. This is important, in part because it 
‘means substantially more lithium is ireversibly lost to the SEI layer 
than previously thought. 

Spectroscopic mapping by EELS shows that the extended type I SEL 
has an increased concentration of oxygen and lithium compared to 
the electrolyte and contains essentially no fluorine (Fig. 2e). Although 
‘no extended type I SEI is present, there ia thin carbon-free, lithium. 
and oxygen-rich layer on the type Il dendrite surface, approximately 
20 nmthick (Fig. 2). Roughly spherical structures up to micrometres 
in size, containing carbon, oxygen, lithium and an elevated level of 
fluorine (Fig. 2c) were frequently observed near both dendrite types 
(shown near the type I dendrite in Fig. 2a, c) but rarely elsewhere in 
the sample. Individual elemental maps, including nitrogen, are shown 
in Extended Data Fig 2. 

By analysing the fine structure of the carbon K-edge using multi 
variate curve resolution, we observe distinct carbon-bonding environ: 
‘ments in the electrolyte, SEI and fluorine-rich structure (Fig. 3a, b, 
Extended Data Fig. 25) The increased intensity from C=O bonds in 
carbonates" in the SEI, along with the increased oxygen content and 
reduced concentration of C-H bonds’ is consistent with evidence that 
the SEI consists largely of lithium ethylene dicarbonate in ethylene 
carbonate-based electrolytes?” In addition, ethylene gas is produced 
during the formation of lithium ethylene dicarbonate from ethylene 
carbonate", which may explain the large bubbles observed in the 
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Fig. 1 | Characterization of dendrite morphologies by cryo-FIB. 

1, Coin-cll arrangement used. b, Raised regions inthe electrolyte frozen 
‘on opened coin-cll electrodes reveal buried dendrite locations. The 
lectrolyte surface was sputter coated with a thin metal layer for increased 
‘conductivity, d, Twa distinct dendrite morphologies, referred to as 

type l(c) and typeI! (d), were observed in serial cross-sectional images 


SEI. The correlation of fluorine and C=C bonds! in the fuorine-rich 
structure may indicate that ethylene is bound to LiF here, the possibilty 
(of which has been discussed previously". Although no substantial 
changes in carbon bonding are observed at the type Il dendrite sur 

face (Fig. 3c, d), the fine structure ofthe oxygen K-edge ofthe thin 
type I SEIis consistent with lithium oxide and hydroxide monohydrate 
(Extended Data Fig. 3a). 

EELS of the dendrite interiors show thatthe type I dendrite contains 
an appreciable quantity of oxygen, whereas the type II dendrite does 
not. The two dendrite typesare therefore distinct in composition as wel 
as morphology. The fine structures ofthe lithium and oxygen K-edge 
(Fig, 4a, b) reveal that the type I dendrite is composed primarily of 
lithium metal, partially oxidized. The lithium K-edge of the type I 
dendrite, however, unexpectedly corresponds to pure lithium hydride. 
Although hydrogen gas is known to be prevalent in cycled lithium 
batteries! only small amounts of Lilt have been observed on freshly 
exposed lithium put in contact with organic electrolytes". However, 
appreciable quantities have been observed on metal oxide conversion 
cathodes in lithium-ion batteries. Although LiH is only metastable 
inelectrolytes because it reacts rapidly with trace moisture and solvent 
‘molecules to form LiOH and Lis0%, these reactions may result ina thin 
passivating layer on the surface of large LiH structures that preserves 
the interior material, consistent with the type II dendrite. 

‘Although it mightbe tempting to assume that the hydrogen required 
to produce the type II dendrites originates solely from reduction of 
water impurities in the electrolyte, it was shown recently that the 
decomposition of electrolyte solvent molecules can produce many 
times more hydrogen than water impurities provide". By assuming a 
type II dendrite volume of about 300m, as determined by our cryo- 
FIB measurements, a quick calculation reveals that the maximum 
density of type Il dendrites that could form in our cells asa result of 
hhydrogen from water in the electrolyte (les than 10 p.p.m. H,0) should 
be roughly an order of magnitude lower than what is actualy observed. 
‘This suggests thatthe electrolyte solvent molecules may also be contrib: 
‘uting hydrogen. Potential pathways to hydrogen production exist for 
any hydrogen-containing electrolyte and therefore all lithium-metal 
batteries that use traditional organic electrolytes are likely to suffer from 
Lil dendrite formation and the associated capacity fade. However, the 
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produced by cryo-FIB and cryo-scanning electron microscopy 
(cryo-SEM). e, Three-dimensional reconstructions ofthe dendrite 
structures highlight the morphological differences. f, Roughly equal 
numbers of the two morphologies were present across many coin cll. 
Error bars represent dendrites that were not unambiguously identified. 
Scale bars, 2um (b-d), 5m (). 


tate of hydrogen production from solvent molecules isa strong func 
tion of cell voltage”, so ful cell batteries that use higher-voltage cath. 
‘odes would produce much larger quantities of hydrogen. This would 
exacerbate the problem of Lil dendrites in these cells, especially when 
5-V-class high-voltage cathode materials that are designed to improve 
energy density are used. 

‘We additionally examined the plasma resonances of the materials by 
simultaneously acquired low-loss EELS (Fig. 4c-e). The type I dendrite 
plasmon suggests thatthe lithium is only partially oxidized, because 
appreciably oxidized lithium forms additional resonances at 18 eV and 
30eV™, which we did not observe. The approximately 13-eV shoulder 
‘on the peak corresponding to the type If dendrite also provides fu: 
ther evidence for the presence of LiH, because the hydrogen K-edge 
is found at 13.6 eV. Using the distinct low-loss spectra of the two den. 
drite materials, we map their spatial distribution within the dendrites 
(Fig. 4c, d), This mapping demonstrates that small LiH regions are also 
present on the surface of the type | dendrite and that a lithium particle 
is present on the tip ofthe type If dendrite. These results are summa: 
ried in Extended Data Table | and Extended Data Fig 2. 

The lithium particle on the tip and the uniformity ofthe LiHt within 
the type Il dendrites, as well as their aspect ratio, suggest a root or tip 
sgrowth mode. Grain boundaries have been shown to increase hydrogen 
diffusion through some metals, and hydrogen selectively penetrat: 
ing into an electrode grain boundary or a lithium-partile-electrode 
interface could initiate LiH formation, The resulting volume expan 
sion would lft the particle or electrode grain away from the electrode 
surface, eading to the observed lithium tip on the dendrite (Fig. 4) 
Growth would then probably proceed mainly a the dendrite-electrode 
interface, owing to the poor electrical conductivity of LiH™. Although 
the formation of Lilli reversible”, the reverse reaction would also 
‘occur primarily at the base, Furthermore, iH is much more brittle than 
lithium metal". These fats, combined with the low electrode contact 
area ofthe type Il dendrites, suggest thatthe type II dendrites become 
disconnected from the electrode more easily during cycling than do 
type I dendrites, Because the total volume and number of dendrites are 
comparable for both types, type I! dendrites may, therefore, contribute 
disproportionately to capacity fade by orphaned or disconnected lit: 
jum, The thickness ofthe type I SE! layer also means that it contributes 
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Fig. 2| Structure and elemental composition of dendrites and their 
interphase layers in electron-trensparent lamella. a,b, Electron, 
transparent cryo-FIB lift-out lamellae of type I (a) and type I (b) 
dendrites. c,d, HAADF cryo-STEM imaging reveals an extended SEI layer 
tn the type! dendrite (c) but ot on the type Il dendrite (d) ef, BELS 
clemental mapping shows that both SEls are oxygen-rich, but that the type 
SEI contains no carbon (contrast has been adjusted for clarity, raw data 
are shoven in Extended Data Fig.2). The type I dendrite hasan appreciable 
‘oxygen content (e), whereas the type Il dendrite does not (0). Fluorine-rich 
structures were often observed near both dendrite types, Scale bats, im 
(ab), 300 am (e-P. 


‘Oxygen Fluorine 


‘more to the loss of lithium material than previously thought; we esti 
‘mate, however, that an order of magnitude more lithium is contained 
within the type If dendrites than in the type I SEI layers. Minimizing 
the formation of LiH dendrites is therefore critical to improving the 
longevity of lithium-maetal batteries. 

Our results suggest that preventing the formation of type II den: 
drites may hinge on a careful choice of solvents and salts to eliminate 
hydrogen-containing species in the electrolyte and to form interphase 
layers that are better able to protect the anode. One way to achieve these 
goals would be to replace hydrogen in the solvent molecules with other 
elements to generate alternative species inthe electralyte, For example, 
we hypothesize that a properly chosen material would result in a 
hydrogen-deprived and fluorine-rich environment in which sacrificial, 
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3 | Analysis ofthe carbon-bonding environment near the dendrites, 
a Spatial variation of the fine structure ofthe carbon K-edge near a 

type I dendrite, obtained from EELS, showing distinct carbon-bonding 
environments, s determined using multivariate curve resolution (MCR: 
contrast has been adjusted for clarity; raw data are shown in Extended 
Data Fig. 2). b, The increased carbonate C=O peak is consistent with a 
lidhium ethylene dicarbonate SE layer. Ethylene gas produced during SEI 
formation may explain the SEI bubbles seen in aand the C—C peak inthe 
fuorine-rich structure. c,d, Two carbon components were present inthe 
electrolyte around the typeI! dendrite, but were no localized along the 
dendrite surface, Scale bars, 300-nm, 


low-stability-window fluorinated components preferentially react at 
the electrodes relative to carbonate solvents, resulting in fluorine-rich 
species in the electrolyte. This would both starve the system of hydro 

igen and promote the formation of a beneficial LiF-rich barrier layer on 
the surface of the anode®”*, stunting the growth of LiH dendrites and 
reducing capacity fade. Although recent studies that usedhigh. concentration 
full-luoride electrolytes are consistent with our predictions”, main 

taining the high fluorine-donating salt content of such electrolytes is 
currently impractical because ofthe high cost ofthe salt. There is there. 

fore reason to explore alternative hydrogen-free electrolyte materials 
that areas effective using different salts at lower concentrations. 

‘Asa test of the above hypothesis, we performed cryo-FIB, cryo. 
STEM EELS and electrochemical experiments oa cells prepared with 
a full-Mluoride-type electrolyte using a lower salt concentration ofa less 
expensive salt (2 M LiPF,) than in previous reports; fully fluorinated 
solvent, luoroethylene carbonate, was used in these electrolyte compo. 
sitions, The results are shown in Extended Data Fig. 4. Consistent with 
‘our hypothesis, we found that this luorinated electrolyte suppresses the 
formation of LiH dendrites substantially and greatly alters the lithium 
deposition, Localized structures are still present, but rather than type 
I dendrites the deposits are much larger, with an inner structure con 
sisting of many smaller ‘blocks’ separated by SEI layers, These blocks 
are composed of partially oxidized lithium, as with the type I dendrites, 
hich was confirmed by cryo-STEM EELS of structures prepared by 
cryo-FIB lift-out. The electrochemical performance was correspond. 
ingly enhanced, with higher Coulombic efficiency and greatly reduced 
capacity fade, as shown ina lithium versus stainless stel set-up and in 
a full-cell battery that used a nickel manganese cobalt oxide cathode. 
Although this demonstrates the feasibility of dendrite suppression and 
Improved battery performance by introducing fluorinated electrolytes, 
the concepts outlined above could potentially he further developed 
by integrating the alternative electrolyte species into cross-linkable 
structures, such as those reported recently". This would result in a 
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Fig. 4| Determination and mapping of dendrite composition. 
a,b, Comparison ofthe fine structures ofthe lithium (a) and oxygen (b) 
K-edge ofthe dendrites (coloured lines) with reference materials (black) 
reveal thatthe type I dendrite was partially oxidized lithium metal, 
‘whereas the typell dendrite yas uniform lithium hydride. e-e, Mapping 
ofthe lithium metal and hydride low loss EELS spectra (e) reveals that 
the type I dendrite was only slightly oxidized and had small LH! regions at 
its surface (), whereas the type It dendeite had a lithium particle tip () 
(contrast has been adjusted for clarity; raw data are shown in Extended 
Data Fig 2). The arrow in e denotes the hydrogen K-edge that appears at 
about 13 eV in Lil, Scale has, 300 nm. 


hhydrogen-deprived and halide-rich electrolyte environment that simul: 
taneouly form lithium-halide-rich and elastic interphases that are able 
to flex to accommodate changes in the volume ofthe lithium anode, 
thus providing further protection against both LiH and traditional 
lithium-metal dendrites. Cryo-FIB and analytical cryo-STEM 
techniques have thus provided us access to the nanoscale structure 
and composition of intact solid-electeolyte interfaces in lithium. metal 
batteries, revealing the existence of LiH dendrites and extended SEL 
layers on lithium metal dendrites, and allowed us to propose pathways 
to overcoming their detrimental effects. 
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METHODS 
Instrumentation and experimental details. We used an FEI Strata 4008 
‘DualBeam focused ion beam/scanning electron microscope system (FIB/SE!) to 
suaracterize and prepare samples was ited with a Quorum PPSOIOT cryo-SEM 
FIR system, which included aliquid ltrogen cold stage and an anticontansnatrin 
‘the main FIB chamber. a preparation chamber for sputter coating) with separate 
cold stage and ananticontaminator attached tothe FB and separated by a valve, 
‘stand-alone workstation for freezing and loading samples ano the specimen 
shut, anda vacuum transier device fo transporting samples between the woe 
station and preparation chamber. In addition, we installed an Oxford OmaiProbe 
CCryoshaft on our OmniProbe 200 nanomanipulator, which is thermally isolated 
fron the room-temperature shaft by a ceramic section and cooled by a copper braid 
attached tothe anticontaminator™ Preparation oflamellae by cryo-FIBit-out was 
‘aried out using techniques described previously. All miling was performed t 
‘sn ion beam voltage of 30 KV. Trenches to forma the intial lamella were generally 
nilled witha beam current of few nanoamps. Thinning ofthe ames was fet 
‘conducted with a beam current of hundreds of picnamps, decreasing with lamella 
thickness toa final thinning with tes of picoamps. After cryo-FIBft-outprepa- 
ration, cry- STEM samples were transfered back int liquid Ninth workstation 
where they were loaded ito cryogenic sample storage boxes and transferred toa 
larg liquid N: storage dear. 

‘Cryo-STEM characterization of these samples was performed on an berration- 
cornected FEI Titan Themis operated at 300 kV The microscape was equipped 
‘vith an X-FEG high-brightness gun and a high-resolution Gatan imaging iter 
(GIF Quantum 965) for EELS. Standard Gatan side-entry cryo-transfer holders 
(mode 626 and model 915) enabled transfer ofthe samples into the microscope 
snd maintained thei temperature near 180°C: throughout the experiment, The 
‘amples were loaded ino the older under liquid N; to minimize ce contamina- 
‘on. During transfer into the vacuum ofthe microscope the sample was enclosed 
bya eryo-shutter which minimized ice build-up. No EELS edges beyond those 
Aliscussed in the main text were observed inthe dendrites between the Li K-edge 
't 5 eV and about 730 eV; including nitcoge, which rules out reactions with ae 
for liquid; during specimen preparation and transfer (Extended Data Fig. 5). 
‘While throughput ofthe cryo-FIEIift-ut/cryo-STEM workflow i continuing to 
bbeimproveditcan approach that of room-temperature FIB and STEM techniues 
‘vith proper optimization, Multiple lamellae were prepared for analysis by cryo- 
STEM for this project. Additional example ate shown in Extended Daa Fig. lik, 
Including an uncycled electrode for reference and two containing typell dendrites 
‘The O Keedge reference spect for iO, and LOH and the Li K-edge reference 
spectrum fot LH were taken on a 200-KV FEI F20 using the same cryo-transfer 
holders and loading techniques. 

“The probe curren for EELS maps onthe Titan was around 25 pA, confirmed 
bby measurement on a direct electron detector witha high dynamic range” and 
sel cell times were 10-50 ms. The electron dase applied during acquisition 
ofthe spectrascapic maps shown in the main text was 5» 10!-5 >" UF e- A 
‘Thesmal bubbles inthe electrolyte appeared rapidly, bepinning by the time the 
firt image was taken witha total dose below 10 e~ A-*. These bubbles were 
probably hydrogen liberated fom the electrolyte solvent molecules™ since Cand 
(0 K-edge fine structures in carbonates are known tobe stable under the bean 
‘up toa dose of about 750 ~ A-? at room temperature under a 200-keVelecton 
bbeam®. The threshold damage for these materials under our cryogenic condi- 
‘Hons using 300-keV beam shouldbe higher than this. A series of maps ofthe 
tlecttade-electalyte interface taken at various total dose is shown in Extended 
Data Fig. 6c, demonstrating the doses at which diferent types of damage occur 
‘The damage mechanisms of thee carbonate solvents are liberation of hydrogen at 
love doses, resulting in structural changes sch as the bubbling observed. At doses 
‘of more than 10° e~ A=, mas loss ecomes important and the fine structure of 
the approximately 247-eV peak isaffected. At 10'e~ A * the mas loss i severe, 
producing holes in the sample and leaving behind mainly the carbonate portion of 
the solvent molecules. However the ine structure associated with ths par of the 
rolecule survives high doses. These findings ate suimarized in Extended Data 
“able 2-On the bass of our damage analysis, we donot expect the fine structure 
te have been altered inthe maps shown in the main text, llhough slight structural 
‘modifications were present, as expected (Extended Data Fig 6d. 0) 

[LH spectra have been mesured previously using various techniques and are 
otallin agreement The moisture sensitivity ofthe material may explain the 
Alisogreement. While precautions were taken in previous work to avoid ai expo- 
sure, such as deavng the LI in vacuum or transfering tothe microscope in 
an argon bag, reactions with small amounts of contaminants could stil occu at 
‘oom temperature. Accurate spectra could be obtained ether by measuring alle 
sample. asthe pssvaing layer would enable the majority ofthe material to remain 
unseacied, or by maintaining the sample a cryogenic temperatures, since eactions 
‘vith contaminants would be essentially eliminated. As expected, the spect fom 
samples measured in bul ort cryogenic temperatures are in agreement with ech 


other and with our data The dificult of characteriing unaltered Li may be 
fone reason why LiH dendrites have not been observed before, and isan important 
‘example of how cryogenic techniques such as cryo-FIB lif-out and cryo-STEM 
‘enable accurate characterization of systems with reactive materials at well sof 
slid- liquid interfaces 
‘Acquisition of reference spectra. The reference spectra forthe lithium metal and 
1130 samples were acquired onthe Titan in similar conditions to those described 
shove, The Li K-edge forlithium metal was recorded on an uncyced lithium electrode 
‘snd the 0 spectrum was ecorded on lithium electrode oxidized inthe micro- 
Scope by warming to room temperature and exposing the electrode tothe bear. 
oth samples were produced by cryo-FIBlif-out. The electron dose forthe metal 
spectrum was 10*e~ A-and nochange infin tructure was recoded by doubling 
thisdose. The oxide spectra were acquized witha total dase of around IP e- A= 
‘While we observed that other lthium-oxygen compounds converted to LO under 
the beam, no changeto the Li,0 fine structure was observed at high doses 
‘Weacquied the thium peroxide and hydroxide reference material fom Sigma 
Aldrich, crushed them int afine powder using a mortar and pestle and pressed a 
hhley carbon TEM grid onto the powder to adhere some tothe gid. The LH wae 
prepared inanargon-flled glovebox owing oitsar sensitivity and removed ina 
Scaled vial which was opened under liquid N eliminating air exposure. The other 
sable samples were also immediately placed under liquid N> alter preparation to 
‘minimize unnecessary air exposure On the F20, we used a probe curreat of about 
75pA.A total doseoflesthan 10" e- A-? yas applied tothe oxides daring aqul- 
ston, which we found was few times lower than the dose necessary to induce 
‘ substantial change inthe O K-edge fine structure. The Lit Li K-edge spectrum 
‘vas acquired with a total dose of the order of 10 ~ A-2,and na change in fine 
structure was observed under any dose, measured to greter than 0c” A.A 
ofthe threshold damages for materials relevant to this study andthe coresponding 
‘damage mechanisms are shown in Extended Data Tale 2, and examples of damage 
‘series profiles used to establih these values are shown in Extended Data Fig. 62. 
"Toalign the energy axis between the O K-edge spectra acquired on the Titan 
and the F20, we used the LsO peak at about 535 eV. The Li Kedge spectra were 
‘dose enough tothe zro-loss peak that no shifting of the spectra was necessary In 
audition, spectra acquired onthe F20 were bandpas filtered by 0.6 eV to reduce 
‘ois below the energy resolution af the microscope. This reduced high-frequency 
noise while preserving larger features accuraty, with only a slight reduction in 
sharp peaks. An examples shown for the Li:O2 0 K-edge in Extended Data Fig, 38, 
‘Preparation ofthe coin-ell battery. Symmetec lithium cells (C¥2032 coin cal) 
‘were assembled with two lithium electrodes (MT1 Corp, 450m thick) and 1 M 
lithium hexafluorophosphate (LPF) in ethylene carbonatedimethy carbonate 
(ECDMC) (vv =L) asthe electrolyte Celgard 3501 was used asthe separator We 
subjected the cells to gavunostti charging for 24 hat a curent densty of l mA 
‘mA charging profile from one coin cell sed ie shown in Extended Data Fig. 7. 
"Ta plunge-freeze the samples slush nitrogen was chosen to avoid detrimental 
{interactions of the electrolyte with typical organi eryogens™ The slash was peo- 
duced inthe cyo-FIB workstation by vacuum pumping ligud Nun solidified 
“This enabled higher cooling rate and reduced bubbling inthe workstation. The 
coin cells were opened atthe cry-FIR workstation and the electrodes separated 
‘snd immediately plunged into the slush nitrogen to preserve the electrolyte onthe 
‘ectrode, The frozen samples were then transfered into the preparation chamber 
attached to the ryo-FIB, typically sputter-coated with a5-10-nm layer of metal 
(Ptor Au/Pd) to reduce charging, and transferred into the ryo-FIB chamber Lift 
‘ut samples produced for cryo-STEM were transferred back tothe workstation, 
‘where they were loaded into cryogenic sample stage boxes under liquid Nand 
transfered toa large liquid N, storage dewar: Cryo-TEM difraction on lamellae 
produced by cryo-FIBif-out showed thatthe electrolyte was frozen amorphous, 
{nd remained so throughout all of the preparation, transfer and characterization 
Steps.as shown n Extended Data Fig 8 
‘Thee-dimensional reconstruction of cryo-FIB cross-sections. To reconstruct 
thethree-dimensional dendrite structures, we used Avizo software (Thermo Fisher 
Scien). Since the geometry ofthe FIB results in images ofthe cross-sections 
taken at oblique angles to the cross-section surface (the electon and ion columns 
ste separated by 32° and th sample surface normals positioned parallel tothe 
fon beam for miling), the 0-50 individual images were aligned vertically by the 
position ofthe electrade surface andthe appropriate length transformations were 
pple to the images to correct fr the oblique viewing angle (yy'/eas@) and 
22=2'/sn(@), where yand re the true depth and height ofthe object, respec- 
tively, and 2 are dhe observed depth and height, and isthe angle between 
the electrode surface normal nd the electron beam), The dendrite and electrode 
structures were segmented by hand within each ofthe crose-sectional images, and 
these segmentations were connected in the perpendicular direction to reconstruct 
thethree-dimensional structure 
[EELS map processing. The large field of view of some EELS maps results in an 
‘ergy sift of the env spectrum at different points inthe map. To accurately 


map edges and analyse edge in structure scrosthe field of view the energy ai at 
‘each piel was sifted to the proper location. Each map was acquired in DualEELS 
‘ode, with both low-loss and high-Loss regions ofthe spectrum recorded, The 
low-loss mapsincluded the 2ero-loss peak. the position of which was used to align 
the energy axis ofthe low- and high-loss regions ofthe spectra simultaneously. 
resuling ina fat energy surface across the map. 

“To map elemental distributions, standard background subtractions were 
performed using a linear combination of pawer las it and local background 
‘veraging with fll-width at half-maximum of ive pisels to increase the back: 
‘round signal-to-noise ratio™. Energy windavrs wider than the fine tructre at 
the edge onset were integrated fr elemental mapping, to minimize the eects of 
spatially varying fine structure onthe apparent elemental concentrations. The fine 
Structure ofthe edges was analysed by multivariate cuve resolution (MCR), which 
salves fora specified number af linearly independent spectral components in the 
data by means of local minimization, A non-negativty constraint was imposed 
‘on the corresponding concentration profiles since negative concentrations are 
‘not physical, but thespecra were no constrained. To improve the signal-to-noise 
‘ato fr the MCR process, the data was typically binned by four (spatially) before 
analysis. To display the spatial distribution ofthe resulting spectral components, 
‘we ited them back tothe original data using Matlab QR solver, which takes 
‘vantage of QR factorization to minimize the residual ofthe equation SC=D. In 
urease, Sis the matric spectral components returned by MCR, Cisthe matrix 
‘concentrations to be solved for and D isthe matrix of original data, Using the 
orginal unbinned data (for the type I maps o the original data binned by two 
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(forthe typeI maps), we produced maps rom the concentration matrix Cfor the 
corresponding MCR spectral components, such asin Fg. 3. 

Data availability. The data that support the findings of this stud 
feom the corresponding author on reasonable request 


available 
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Extended Data Fig. 1 | Schematic and SEM images of the ryo-F1B 
list-out sample preparation process, and examples of additional final 
lamellae. a, A buried structure or interface is identified for preparation, 
here a dendrite embedded in frazen electrolyte above the anode 
(indicated by the red arrove). In our coin-cell batteries, raised regions 

of electrolyte were used to localize buried dendrites. b, ef, Trenches 

face then sitespecifically milled around the site of interest, forming 2 
vertical cross-sectional lamella containing the structure or interface, The 
ample is aligned in the microscope so thatthe electrode surface normal 
fs parallel to the electron beam direction in e and tilted by 52° to image 
the lithium anode-electrolyt interface and the electrolyte-embedded 
dendrite inf. ,g,A cooled nanomanipulator needle is then attached to 


Thinning 


Dendrite 


Electrolyte 


Lithium anode 


the cryo-immobilized lamella by water vapour from agas-injection system 
deposited as amorphous ice. The lamella is then cut free from the sample 
and lifted out. dh Finally, the lamella is attached to a TEM grid post with 
{ditional ice deposition, cut free from the nanomanipulator and thinned 
to lectron transparency withthe ion beam, i,j, Lamellae containing type 
LU dendrites above lithium electrodes, The lamella in i contains a 
Auorine-rich structure as well. Diferent electrolyte thicknesses and 
milling parameters were used to prepare these lamellae, resulting in 
different final dimensions. k, lamella produced from an uncycled 
‘electrode, used to obtain reference spectra. The increased signal af the 
uncycled electrode is due to different image-acquisition parameters, nota 
material difference. 
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Extended Data Fig. | Elemental maps of the regions near both types 
of dendrite surface, carbon-honding environment maps resulting 
from fitting of MCR spectra back to original data, and corresponding 
summary schematics af both dendrite types and their SEI layers. 
Carbon, oxygen and fluorine sre shovsn in a composite map in Fig, 2. 
ad, Individual clemental maps showing the fall count range, excluding 
10.2% ofhigh- and low-intensity outliers, make it lear that there isa 
substantial concentration af oxygen inthe type I dendrite and very litle 
in the type I dendrite, and that there is increased oxygen in the type | 
SEI compared tothe electrolyte. In addition, essentially no urine is 
present in the type I SEL, and the large fuorine-rich structure contains 
‘higher fluorine concentration than the electrolyte, Nitrogen maps are 
included as well, and largely show noise with litle spatial dependence 
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Corresponding count scale bars are show next to each map. b, Individual 
‘maps determined by MCR corresponding tothe spectra primarily located 
at (and labelled in Fig. 3 as) the SEL, electrolyte and fluorine-rich structure 
(top to bottom), displaying the original counts, e, Individual plasmon 
‘maps determined by MCR for LiH, lithium and the electrolyte (op to 
bottom) displaying the original counts f,Top, typeI dendrites consist of 
partially oxidized lithium metal with small Li regions atthe surface, and 
hhave an extended SEI layer consistent with lithium ethylene dicarbonate 
(LEDC) that contains bubbles, probably from ethylene, aby-product of 
the SEI formation, Large fluorine-rich structures are often found neat the 
dendrites. Bottom, type Il dendrites consist of uniform LiH and have a 
‘compact Li,0/LiOH + HO SEI layer Although not depicted, flaorine-rich 
structures were also observed neat type Il dendrites. Scale bars, 300 nm, 
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Extended Data Fig. | Comparison of the type II SEI oxygen Keedge charity. b, A 0.-eV bandpass (BP) filter was applied tothe O K-edge 
‘with reerence spectra and an example ofa andpass-fitered spectrum. spectra acquired an the £20 to remove high frequency noise. Ths 
2,The O Kuedge ofthe type dendrite appears tobe consistent witha preserved the main features ofthe edge wile eliminating those below the 
Combination of LO and LiOH- HO. Spectra are ost vertically for energy resaution ofthe instrument 
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Extended Data Fig. 4 | Cryo-PIB, cryo-STEM EELS and electrochemical 
results comparing lithium deposition in cells using traditional and, 
fall-fluoride electrolytes. a,b, Cryo-F1li reveals thatthe dendrite 

density fs much lower forthe fll-uoride uoroethylene carbonate 
(PEC) electrolyte (b) than with the traditional EC-DMC electrolyte (a). 
Inthe former case, nearly no LiHl dendrites are present andthe lithium 
deposition is modified, forming broad localized depositions. d, Cross 
sections of these deposits reveal that they are composed of many smaller 
“blocks in contact, separated by SEI layers. ,f, A lamella of this type of 
deposition was prepared by cryo-FIB ift-out (e) and cryo-STEM EELS 
tf the Li K-edge of the material revealed that it is composed of partially 
oxidized lithium metal (f, as was the type l dendrite in the traditional 
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clectrolyte.g, The Coulombic efficiency measured in alithium versus 
stainless steel set-up using a constant current density of I mA cm? 

tnd capacity of l mAb em = was greatly improved forthe fall-luoride 
clectrolyte compared tothe traditional electrolyte. h, Cyling ofa full 

cell comprising a lean lithium anode (50m) and anickel manganese 
cobalt oxide (NCM) cathode (2 mAh em *) with the fll-luoride 
clectrolyt resulted in a substantial decrease in capacity fade and improved. 
Coulombic efficiency over the traditional electrolyte. The discharge 
‘capacity is plotted onthe let axis, whereas the Coulombic efficiency is on 
the right axis. The operating voltage range was 4.3 V to 3 V.Inall figures, 
the red lines and symhols represent results for the EC:DMC, 1 M LiPE, 
clectrolyt, whereas the black lines and symbols are for FEC, 2 M LiPE,,. 
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Extended Data Fig. 5| Full spectra recorded from the dendrites adsorbed on the surface ofthe sample in the microscope vacuum, which 
(intensities on a logarithmic scale). The spectra show clear differences would typically react with materials such as lithium ar sodium at roam 
inthe plasmons and Li Kedges, as well sa large difference in oxygen temperature. No nitrogen was present in either dendrite, confirming that 
content between the type [and type l dendrites. The small amount ro reaction with nitrogen in the si or liquid N: had occurred. 


of oxygen on the type Il dendrites probably due to water molecules 
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Extended Data Fig. 6 | Example damage series profiles and initial) 
final spectra taken for lithium materials relevant to this study over a 
range of doses at which damage occurs, dark field cryo-STEM images 
at various types of damage induced in frozen organic clctrayte at 
different doses with corresponding spectr, and before and after images 
ofthe regions in which the EELS maps in the main text were taken. All 
Spectra were recorded at cryogenic temperate. a, We found all oxide 
materials convert to LzO under large doses. LiO and Ll are primarily 
‘fected by mas los, with no substantial changes in fine structure. The 
‘maps presented inthe main text were aeguited at doses lowe than the 
dose indicated bythe red areows shown atthe batom ofthe plas of 

the order of 1022 A bye, While some structural modification ofthe 
elececyte material was prescott love doses, probably dic liberation 
Of hydrogen, «dose greater than 10° <A = was requied for substantial 
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‘massloss and modification of spectral fine structure. AL3 = 10°” A> 
Sspproximately 50% of the material remained after the map, a determined 
bythe ADF signal At 10*e” A *the material was completely removed in 
some areas, bul the carbonate portion ofthe molecule remained Doses 

Sppled daring acquisition ofthe maps in the man text were less than the 
lowest dose show here Spectra are offset vertically for clarity. de, ln the 
snaps displayed inthe main text, mall stractral changes were observed in 
the organic material, which s expected given our damage snaljsi. This is 
probably duct iberation of hydrogen from the mslecules, which occurs 

low dose. The fine strcture not great affected until approximately 


an order of magnitude higher dose than was applied during these maps, 
‘which was of the order of 10° & A~* Scale bars, 200 nm, 30:nm and 60 nm 
(b left to right), 300 nm (de) 
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Extended Data Fig.7 | Charging profile from a symmetric lithium coin cel, constant current of L mA em was applied to the cell for 4h 
(bottom). The resulting voltae profile from one ofthe coin cells used i shown inthe top panel 


LETTER 


~~ 
— 


21/nm 


Extended Data Fig. 8 | Amorphous diffraction pattern of the electrlyte 
recorded in acryo-lamella produced by cryo-FIB lift-out. Cryo-TEM 
Alifraction ofthe electrolyte on samples produced by eryo-F1B lit-out 


shows that is Frozen amorphously and does not recrystallize st any point 
during the preparation, storage, transferor characterization, 
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Extended Data Table 1 | Comparison ofthe properties of type land Il dendrites 
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Extended Data Table 2 | Threshold electron doses and primary 
damage mechanisms observed for relevant materials 
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Building C(sp*)-rich complexity by combining 
cycloaddition and C-C cross-coupling reactions 


Tie-Gen Chen", Lisa M. Barton", Yutong Lin™, Jet Tsien’, David Kossler', Ifiaki Bastida', Shota Asai, Cheng Bi', Jason 8. Chen’, 
Mingde Shan‘, Hui Fang?, Francis G. Fang?, Hyeong-wook Choi, Lynn Hawkins?, Tan Qin! & Phil 8, Baran! 


rized for their ability to rapidly generate chemical complexity 
by building new ring systems and stereocentres', cycloaddition 
reactions have featured in numerous total syntheses? and are a 
key component in the education of chemistry students’. Similarly, 
carbon-carbon (C-C) cross-coupling methods are integral to 
synthesis because of their programmability, modularity and 
reliability’, Within the area of drug discovery, an overreliance on 
cross-coupling has led toa disproportionate representation of flat 
architectures that are rich in carbon atoms with orbitals hybridized 
inan sp" manner’ Despite the ability of cycloadditions to introduce 
‘multiple carbon sp° centres in a single step, they are less used®. This 
is probably because oftheir lack of modularity, stemming from the 
idiosyncratic steric and electronic rules for each specific type of 
cycloaddition. Here we demonstrate a strategy for combining the 
optimal features of these two chemical transformations into one 
simple sequence, to enable the modular, enantioselective, scalable 
and programmable preparation of useful building blocks, natural 
products and lead scaffolds for drug discovery. 

Retrosynthetic chemical analysis i built upon the strategic identifi 
cation of the reactions (transforms) that offer the greatest potential to 
simplify target preparation (the “T-goal’}. To this end, the capacity of 
cycloadditions to generate complex ring systems and multiple bonds 
with precise stereochemical control is unmatched, By contrast, 
cross-couplings such as Heck, Suzuki and Negishi reactions are capa. 
ble of making only one bond at a time (most often between flat sp? 
systems) yt are the most used C-C bond forming methods in the 
patent literature”. To understand this phenomenon, it is worthwhile 
to compare the features ofthese two diverse reaction classes (Fig, 1a) 
Cycloadditions form two new sigma bonds, generally through con 
certed pathways that follow precise rules for predicting stereo- and 
regiochemistry. Starting with specialized building blocks that are 
designed to enable the reaction to take place cleanly, this process rapidly 
accesses complexity’. On the other hand, C-C cross-couplings form 
‘one new sigma bond between easily accessible building blocks, using 
a transition-metal catalyst to reliably and controllably produce new 
connections. What C-C cross-coupling lacks in terms of complexity 
generation, it makes up for in terms of sheer reliability and modularity. 

"These distinct features are illustrated by the syntheses ofthe alka. 
Joid epibatidine (1)” and the commercial antihypertensive medicine 
Cozaar (2)" (Fig. 1a) Epibatdine, coveted fr its pronounced analgesic 
properties, has been prepared by total synthesis more than 60 times 
(see Supplementary Information for a complete listing). Of these 
syntheses, atleast 31 have used cycloaddition chemistry as their key 
ing-constructing step. The vast majority ofthese approaches involve 
formation of the bridged pyrrolidine core, followed by stepwise, 
and often lengthy, pyridine incorporation. Meanwhile, medicinal 
agent 2, bereft of any stereogenic centres or topological complex 
ity, was heralded as one of the first examples ofa ‘blockbuster’ drug. 
Both its discovery and its eventual manufacture used C-C cross 
couplings (Ullmann and Suzuki reactions) forthe key bond cleavage!" 


‘This facilitated both a convergent assembly and a modularity that pe: 
‘mitted the rapid exploration of hundreds of analogues. 

Here, we sought to combine the innate complexity generation of 
cycloaddition with the simplicity and modularity of 
‘When applied to structures such as 1, this strategy will per 
generation of analogues, and when applied to medicinal programs such 
2, it will allow for the rapid exploration of otherwise challenging 
complex chemical space. 

Building block 3, of hypothetical value in medicinal chemistry rep. 
‘resents the manifestation of this idea (Fig. 1b). Although its structure 
‘would seem to suggest that itcould be formed through a Diels-Alder 
reaction, the relevant synthetic building blocks, structures 4 and 5, are 
‘not electronically matched and therefore one would expect no reaction 
to take place. Even ifa workable enantioselective Diels~Alder reaction 
could be invented to achieve this transform, the strategy would suffer 
from a lack of modularity. In order to solve this problem, one could 
envisage using a hypothetical vicinal dihaloethylene (6) in place of 
5, with a fumarate-type dienophile such as 7 serving asa viable syn 
‘thetic equivalent. The favourable matched electronics ofthe dienophile 
should allow fora facile Diels-Alder reaction and subsequent radical 
cross-coupling (RCC). 

“To address the enantioselectivity challenge, a combination of trans 
forms was proposed (Fig, 1c). Maleic anhydride was chosen asa surro- 
gate forthe hypothetical chiral (pseudo)dihalide 6 given its inexpensive 
‘nature, ready participation in most cycloaddition modes ([2+1], [242] 
[342], [4+2]),and known desymmetrization through chiral Lewi 
base-mediated alcoholysis. Our sequence for generating complexity 
{in a modular fashion involves five simple steps: firs, cycloaddition 
to build a scaffold; second, desymmetrization to set absolute stereo. 
chemistry: third, RCC to install a new C-C bond: fourth, hydrolysis; 
and filth, RCC to forge another new C-C bond. The known versatility 
of RCC enables a range of functionalities to be installed, from aryl!” 
and heteroaryl! systems to alkenes", alkynes!® and alkyl groups'”. We 
describe here the preparation of more than 80 synthetic examples and 
‘multiple applications, covering a range of natural products (including 
the synthesis of 1) and real-world examples from industrial settings. 

Studies commenced with Diels-Alder [4+2] cycloaddition (Fig 2), 
by which a large variety of enantiomerically enriched scaffolds cou! 
be produced in a simple modular fashion. First, scaffolds Ay-Ag (Az 
being a Diels-Alder adduct and all others being derived from Diels 
‘Alder/hydrogenation) were desymmetrized using Deng’s conditions, 
With either quinine or quinidine used as the Lewis base to deliver mixed 
acids/esters in excellent enantioselectivity"*. Next, the mono-acid 
substrate was subjected to successive Negishi®!"~" and Suzuki" type 
RCC reactions, depending on starting-materal availabilty or individual 
preferences. In this manner, some of the most simple and inexpensive 
Diels-Alder adducts known (Ay costs USS38.54 per mol and Aa is 
(US89.52 per mol) can be transformed into enantioenriched products 
(11-63) of high value. Indeed, none of these products can at present be 
prepared using a Diels-Alder reaction (racemic or enantioenriched) 
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because in most cases the reaction would require both an electron-rich 
diene and an electron-rich dienophile and therefore is electronically 
tunfayoured. Furthermore, controlling the chemoselectivity when 
there are multiple alkenes present in the dienophile is challenging 
sing traditional Diels-Alder chemistry; however, with our method, 
diverse alkenes and alkynes can be easly installed post-cyclization, 
with excellent isomeric and geometrical purity (accessing 12, 15, 19, 
21, 25 and 31). Typically, scaffolds derived from Diels-Alder adducts 
‘Ar-Ag have a clear retrosynthetic signature, wherein the electro 
withdrawing groups on the dienophile have been homologated, 
alkylated, or degraded. Therefore, our approach allows access to 
the previously unexplored chemical space of electron-rich, chiral, 
1,2-disubstituted Diels-Alder scaffolds. Using the tactical combination 
outlined above, virtually limitless array of substituents is now easily 
accessed, including halogenated arenes, Lewis-basic heterocycles, 
‘,}-unsaturated carbonyl groups, cyclopropanes, cyclobutanes, sulfur 
moieties, and alkenes. In the case of exo and endo isomers Agand As 
(both commercially available), the sequences converged to an identi- 
cal product using the same coupling partners (27 could be prepared 
from Ayor As). 

‘As shown in Fig. 2b-d, the strategy outlined above could also be 
applied to [342], [2+2] and [2+1) eyeloadditions. The vast scope 
observed with Diels-Alder chemistry was also seen in these cases, 
accessing substituents such as substituted olefins, terminal alkynes, 
homoallyl groups, and heterocycles. Pyrrolidine-containing systems 
could be derived from simple building blocks By and By (accessed 
through dipolar cycloaddition)" to furnish 64-82 with high enanti- 
meric excess, Quick access to pyrrolidine-containing drug scaffolds 
i useful: historically these have been extremely relevant to medicinal 
chemists, with around 40 pharmaceuticals containing this motif. 
As with the Diels-Alder adducts, none of these structures has been 
accessed before. Therefore, our approach serves asa modular entry to 
chiral variants ofthese coveted scaffolds 

{In a similar vein, enantioenriched eyclopentenes were easily pro- 
duced from palladium-catalysed, trimethylenemethane-based, formal 
[3+2] cycloaddition adduct By". Such structures are highly challenging 


to access in any other way. Scaffolds Cand C2, representing entry to 
[24+-2]-derived systems, could be similarly processed. Modular access 
to enantioenriched cyclobutanes is important and useful given the 
strict electronic requirements for photochemical cycloaddition and the 
documented challenge in achieving general asymmetric induction”. 
[Access to 1,2-disubstituted cyclobutanes (83-86, derived from C,) 
compares favourably with photochemical approaches to such systems 
that frequently give inseparable racemic mixtures of regiomers and 
diastereomers. Furthermore, access to tetrasubstituted, chiral cyclobu 

tanes is highly useful, as multiple families of dimeric and pseudodi 

meric cyclobutane natural products contain such structural motifs" 
Structures such as 87, which would be otherwise extremely difficult to 
access through conventional photochemistry, can be easily prepared 
in enantioenriched manner from maleic anhydride heterodimer. 

zation adducts such as C;. Finally, the strategy outlined above when 
applied to [2+1] eycloadditions using scaffolds Dy and D3 isa major 
departure irom the common retrosynthetic logic normally applied to 
such structures, Conventional approaches, usually involving late-stage 
cyclopropanation of an olefin, suffer fom lack of enantiocontrol in 
the absence of directing groups or specialized carbenoid donors and 
complex catalysts, Structures 94 and 95 (Ds-derived) are particu 

lary illustrative ofthis fact: it would be extremely challenging to access 
either ofthese in an enantioselective fashion with current synthetic 
technology (cyclopropanation or C-C cross-coupling). As testament 
to the power of tis strategy to access diverse libraries, we synthesized 
an additional 48 enantioenriched compounds in similar manner (see 
Extended Data Figs | and 2 for details). 

‘To further demonstrate the potential ofthis approach to simplify 
synthesis, we present six applications inthe total synthesis of natural 
products and in both early-and ate stage medicinal chemistry programs 
(Fig. 3-f). As mentioned above, alkaloid 1 has been a popular target 
ofboth academic and industrial scientists. Through the application of| 

ycloaddition and cross-coupling, the native carboxylate required for the 
Diels-Alder reaction can be used directly to produce epibatidine (1) in 
five steps (for optimization and in-depth analysis of previous approaches, 
see Supplementary Information) with a 38% gram-scale overall yield 
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Fig. 2 | Substrate scope of combining cycloaddition and C-C cross 
coupling. a-d, The cycloaddition component is shown in black, the frst 
‘cross-coupling in green and the second cross-coupling in blue. The yield 
land enantiomeric excess (e..) refer to the second cross-coupling. Besides 
compound 89 (diastereromeric ratio (dx), see Extended Data Fig. 2) 
excellent diastereoselectvity (dx. greater than 10/1) was observed in all 


(Fig. 3a). Its worth noting thatthe key decarboxylative cross-coupling 
takes place with 959% isolated yield (gram-scale, 72% yield), 

Saphris (asenapine, 102; Fig. 3b), an antipsychotic approved by 
the US Food and Drug Administration (EDA), is currently mar. 
keted as a racemic mixture (although the (+)-isomer has superior 


cross-couplings. See Extended Data Figs. | and 2 for complete substrate 
scope and Supplementary Information for synthetic detail, X-ray 
structure data are available for compounds 11,19, 23,25, 28, 65 and Cs. 
IN, Negish cross-coupling; S, Suzuki cross-coupling; , Kumada cross 
coupling: oc, fet-butyloxycarbony; TIPS, trisopropylsiyl: Ts, tosy. 


pharmacokinetic properties)". This near symmetric molecule has been 
challenging to prepare enantoselectively, a the two aromatic systems 
differ only in one chlorine substituent. Iis therefore hard to envisage a 
cycloaddition that could be rendered enantioselective forthe prepara: 
tion of 102, nd only one enantioselective approach has been reported 
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Fig. 3 | Applications of combining cycloaddition and C-C cross- 
coupling. a, Gram-scale synthesis of (+)-epibatdine. b, Asymmetric 
synthesis of asenapine (Saphris). , Modular synthesis of intermediate 
105, a fragment used in the symthesis of «cyclobutyl-containing 

analogue of epothilone. d, Synthesis ofa Leo Pharma key intermediate 
«Synthesis ofan Eisai Pharmaceuticals key intermediate, f, New chiral 
2rsynthons for cycloaddition: application to the modular asymmetric 
synthesis ofan inhibitor ofthe EED protein, Excellent dastercosclectivity 


in 16 steps". However, 102 can be prepared in formally six steps, with 
complete enantiocontrol, from Bz, through a strategy that could also 
be used to make an array of near-symmetric analogues. 

Epothilone—a famous natural product that inhibits the dynamics 
of cellular microtubules and inspired the FDA-approved medicine 
Ixabepilone"”—has been the subject of numerous synthetic studies 
and analogue campaigns™. Intermediate 105 (Fig. 3c) has been used 
during Nicolaou’ synthesis of a cyclobutyl-containing analogue of 
epothilone, but required 15 steps (24% overall yield) to be prepared in 
enantioenriched form from C,, via enzymatic desymmetrization and 
a series of carefully choreographed homologations#. Using the same 
starting material, C, we have prepared intermediate 105 in only eight 
steps, through sequential desymmetrization, RCC with alkyne 103, 
hydrolysis, RCC with alkyl zinc 104, protecting-group exchange, and 
finally hydsoboration/oxidation. 

Differentially substituted norbornene rings have been shown to 
be useful phenyl bioisosteres in medicinal chemistry; however, their 
broad implementation is hindered by alack of methods for their rapid 
‘modular construction (unlike the construction of aryl systems)". In 
collaboration with Leo Pharma, we prepared a key target for an ongoing 
program (109; Fig. 3d) from Diels-Alder adduct 106 via alkene hydra- 
tion and RCC withthe pyrazole-boronic ester 108 in 44% yield. Ibis 


wet) O ar 4 


(dr greater than 10/1) was observed inal cross-couplings. 

See Supplementary Information for full synthetic details and schemes, 
Xcray structure data ae available for compounds L2HCI, 107 and 
118:2HCL. Boc,tet-butylexycarbonyl Pin, pinacol group: TBDPS, trt- 
butyldiphenylsly; TBS, trt-butyldimethylsly, TCNHPI,tetrachloro- 
1N-hydroxyphthalimide, THE. etrahydropyranyl TIPS, trisopropyliyl 
TMS, trimethyl 


‘worth noting that this is, to our knowledge, the fist report ofa RCC 
reaction using a bis(pinacolato)diboroa (Bpin) derivative rather than 
boronic acid This advance was achieved using an in situ prepared ate 
complex, prepared by adding one equivalent of 1-Buli relative to the 
aryl-Bpin donor, We lo used this modification of aur Suzuki-RCC 
protocol to prepare cyclopropanes 91, 92 and 93. 

Finally, an ongoing program at Eisai Pharmaceuticals necessitated 
the preparation of complex scaffold 114 (Fig. 3e), wherein the key 
structure-activty relationship tobe explored resided atthe aryl (green) 
portion ofthe molecule. Ths isa particularly powerful application of 
the strategy outlined herein, because a carboxylate needed toachieve 
the diastereaselective Diels-Alder reaction to construct the decalin 
framework—servedas a gateway forthe exploration of chemical space 
at the desired position. Thus, an asymmetric Diels-Alder reaction 
using Corey's oxazaborolidine" catalyst, followed by functional group 
‘manipulations (see Supplementary Information), led to intermediate 
112, which could be cross-coupled with boronic acid 113 ina particu 
larly challenging context to deliver 14 and enable biological follow-up. 

“The strategy outlined here could be useful for more than just forging 
C-C linkages through cross-coupling. The incorporation of cassie 
nucleophilic functionalization (in ketone synthesis) and decarboxy 
lative functionalization (in the Hunsdiecker reaction and in Curtius 
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and Wolff rearrangements) of intermediate adducts opens up innu 
erable possibilities for diversification, rendering access to previously 
inaccessible building blocks (Fig. 31). Asan example, we used a chiral 
2 enamine synthon to prepare 118, an inhibitor ofthe embryonic 
ectoderm development (EED) protein (K,=4 nM), which was previ- 
ously prepared by AbbVie™ through a non-modular, racemic [3-+2] 
approach in 1.9% overall yield. Our approach began with intermediate 
‘meso scaffold B, and used RCC followed by Curtius rearrangement, 
yielding optically pure 118 in 13% overall yield. 

"The advance that we have described is largely strategic in nature, and. 
thus the underlying limitations are tied mainly to individual param- 
eters of a particular cycloaddition and ensuing RCC reactions, That 
said, cis-1,2-disubstituted products are not currently accessible unless 
downstream isomerization reactions are pursued, which need to be 
evaluated on a case-by-case basis, Although ligand-controlled RCC 
reactions might eventually address this issue, in its present form this 
platform for modular molecular assembly holds great promise for 
Accessing new areas of chemical space. The combination of classic 
cycloaddition chemistry with newly emerging radical C-C coupling 
offers a powerful way to repurpose the most classic skeleton-building 
reactions of organic synthesis, to simplify the enantioselective prepa- 
tation of building blocks, natural products and medicines. 


Data availability 
‘Metecal parameters forthe structures of 11, 19,28, 25,2844, 45,49, 65, C2, 
107, -2HCLand 118-2HC1 ae available fee of charge from the Cambridge 
Crystallographic Data Centre (CCDC) under reference numbers 1837572, 
1837575, 1837377, 1837578, 1837579, 1937573, 1837374, 1937576, 1821860, 
1821878, 1821879, 1825177 and 1838237, respectively. Daaare available from 
the coresponding autho on reasonable request. 
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Cooperative asymmetric reactions combining 
photocatalysis and enzymatic catalysis 


Zachary C. Litman'?, Yajie Wang?”, Huimin Zhao?" & Jon F. Hartwig! 


Living organisms rely on simultaneous reactions catalysed by 
‘mutually conspatible and selective enzymes to synthesize complex 
natural products and other metabolites. fo combine the advantages 
of these biological systems with the reactivity of artificial chemical 
catalysts, chemists have devised sequential, concurrent, and 
cooperative chemoenzymatic reactions that combine enzymatic 
and artificial ctalysts!*. Cooperative chemoenzymatic reactions 
consist of interconnected processes that generate products in yields 
and selectivities that cannot be obtained when the two reactions 
are carried out sequentially with their respective substrates”. 
However, such reactions are difficult to develop because chemical 
and enzymatic catalysts generally operate in different media at 
different temperatures and can deactivate each other'™*. Owing to 
these constraints, the vast majority of cooperative chemoenzymatic 
processes that have been reported over the past 30 years can be 
divided into just two categories: chemoenzymatic dynamic kinetic 
resolutions of racemic alcohols and amines, and enzymatic reactions 
requiring the simultaneous regeneration of a cofactor**". New 
approaches to the development of chemoenzymatic reactions are 
needed to enable valuable chemical transformations beyond this 
scope. Here we report a class of cooperative chemoenzymatic 
reaction that combines photocatalysts that isomerize alkenes 


with ene-reductases that reduce carbon-carbon double bonds to 
generate valuable enantioenriched products. This method enables 
the stereoconvergent reduction of E/Z mixtures of alkenes or 
reduction of the unreactive stereoisomers of alkenes in yields and 
enantiomeric excesses that match those obtained from the reduction 
of the pure, more reactive isomers. The system affords a range of 
enantioenriched precursors to biologically active compounds. 
More generally, these results show that the compatibility between 
photocatalysts and enzymes enables chemoenzymatic processes 
beyond cofactor regeneration and provides a general strategy for 
converting stereoselective enzymatic reactions into stereoconvergent 
To develop a cooperative chemoenzymatic reaction (Fig. 1) that 
enables a stereoconvergent enzymatic reduction of an isomeric mix: 
ture of alkenes, initial reactions were conducted with the model sub 
strate 2-phenylbut-2-enedioic acid dimethyl ester (1a). YersER, an 
ene-reductase isolated from Yersinia bercovieri, exclusively reduces 
the Eisomer of La ((E)-La) to dimethyl 2-phenylsuccinate (2a) in high 
yield and with excellent enantioselectivity in the presence of a glu 
cose dehydrogenase enzyme for cofactor regeneration (Supplementary 
Table 3). The observed selectivity results from interactions between 
the Band Z isomers of la and the ative site of the enzyme". These 


Fig. 1 | Chemoenzymatic reactions a, Types 
of chemoenzymatic reaction. At least one 
enzymatic catalyst and one chemical catalyst 
Js present in all reactions (1) Reactions 
‘conducted in a sequential manner with an 
intermediate step in which reaction conditions 
are altered or reagents or catalysts are added. 
(2) Two irreversible reactions conducted in 

4 simultaneous manner. (3) A cooperative 
reaction consisting ofa reversible reaction 
combined with an ireversible reaction run 
simultaneously (4) A cooperative eaction in 
‘which a sub-stoichiometric quantity af cofactor 
for reagent ie recycled and both catalysts 
‘operate simultaneously. The prime indicates 
an altered oxidation sate ofthe cofactor. 

See Supplementary Fig. 1 for additional 
explanation and definitions. b, Combination 
fof photocatalytic isomerization and enzymatic 
reduction of alkenes. The asterisk indicates 
the chiral center. Ar, aryl; GDH, glucose 
dehydrogenase; NADP(H), (reduced) 
nicotinamide adenine dinucleotide phosphate 
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Fig. 2 | Simultaneous isomerization and reduction of (Z)-1a. 
a, Reduction of (Z)-1a in the presence ofa series of photocatalystsand 
biue light. Photostationary state (PS) isthe percentage of ( 
present after (Z)-1a was irradiated with blue light inthe presence of 5% 
photocatalyst for 24h, Experiments labelled ‘None’ were conducted in 
the absence ofa photocatalyst, but the presence of ambient light; those 
belled'None’ were conducted in the absence ofa photocatalyst, but 
the presence of blue ight. The photostationary state values for 
experiments labelled ‘None’ and None’ have been corrected for initial 
trace quantities of (£)-1a_b, Structures of I-16 and FMN. EosY, eosin ¥ 
ER, ene-reductase; FAD, flavin adenine dinucleotide; Fluor, fluoresce! 
EMN, flavin mononucleotide; Ribo, riboflavin; RT, room temperature: 
Rubpy, [Ru(bpy)sICl: RuBpz, [Rulbp2)s](PF,):(bpy.22-bipyridine; 
bpz,22!-bipyrazine). 


interactions have been investigated for related diester and cyanoacrylate 
substrates!*!®, To enable stereoconvergent reduction reaction, it was 
necessary to identify a catalyst for the E/Z isomerization of olefins that 
{compatible with ene-reductases. An appropriate isomerization cata- 
lyst would operate in aqueous solution at ambient temperature; remain 
active in the presence of ene-reductases, substrates, and products in 
the reaction mixture; isomerize olefins atthe low substrate concentra- 
tions required for enzymatic reduction; and generate the more reactive 
isomer of a substrate from the less reactive isomer. The isomerization 
catalyst must also be tolerated by the ene-reductases; be mutually com: 
patible with a regeneration system comprising glucose dehydrogenase 
and reduced nicotinamide adenine dinucleotide phosphate; and not 
racemize the product. 

‘After determining that negligible thermal isomerization of (Z)-La 
‘occurs under ambient conditions (Fig. 2a), we considered that recently 
reported photocatalytic isomerizations of alkenes!” * could be com- 
bined with enzymatic reduction to develop the proposed coopera 
tive chemoenzymatic process. Our initial experiments assessed the 


photoisomerization of (2)-La, in solvent mixtures appropriate for the 
reduction of alkenes by ene-reductases Limited isomerization of (2) 
a was observed in semi-aqueous media (1:9 DMSO*Tis buffer) inthe 
presence of blue light (450-470 nm) and in the absence ofa photocata- 
lyst (Fig. 2a). However, extensive isomerization of (Z)-La was observed 
‘when the reaction was conducted with riboflavin in the presence of 
blue ight. A photostationary state consisting of a9:1 ratio of (E)-1a to 
(Z)-la vas obtained ater (Z)-1a was ieradiated with blue light for 24h 
in the presence of5% riboflavin (Fig. 2a). However, modest yields of 2a 
‘were obtained when YersER and riboflavin were used ina simaltaneous 
isomerization and reduction of (Z)-1a in the presence of blue light, 
‘This result demonstrated the importance of establishing compatibility 
between the enzyme and the photocatalyst. We proposed that com- 
petitive binding of riboflavin tothe flavin-binding site of YersER led 
to the inhibition of enzymatic activity. Therefore, we sought alterna: 
tive photocatalyst that would ead to isomerization of (Z)-La without 
inhibiting YersER or the cofactor-regeneration system. 

‘The E/Z isomerization of (Z)-La was evaluated in the presence of 
a series of organometallic and organic photocatalysts in 1:9 DMSO: 
150 mM Tis buffer, and the photostationary state for each combina: 
tion is recorded in Fig. 2. Greater than 40% conversion of (Z)-1a to 
(£)-1a was observed when the substrate was irradiated in the pres 
ence of the majority ofthe photocatalysts, and the highest E/Z ratios 
exceeded 8:1. Increasing catalyst loading and increasing light intensity 
enhanced the rate of photoisomerization of (Z)-1a (Supplementary 
Figs. 15, 16,18). 

Having identified a series of photocatalysts for the isomerization 
of (2)-1a to (E)-La ina semi-aqueous medium, we evaluated the 
simultaneous, cooperative photoisomerization and enzymatic reduc- 
tion of (Z)-1a with the same photocatalysts (Fig. 2a). Moderate to 
high yields of 2a were obtained when a range of catalysts were used 
in the cooperative process. The highest conversions and yields were 
obtained when the cooperative reactions were conducted with 5% 
of flavin mononucleotide (FMN) or 5% of the eationie iridium 
(111) complexes {Ir(dmppy)s(dtbbpy) PF (Ie-16), [Ir(dtbbpy) 
(ppy)alPFs (18-67) and [le(dtbppy)a(dtbbpy) PF (1-80) (dmppy, 
4-methyl-2-(4-methylpheny))pyridine; dtbbpy, 44°-di-tert-butyl- 
2,2'-bipyridine; ppy, 2-phenylpyridine; dtbppy, 4-(tert-butyl)-2. 
(G-(ert-butyl)pheny)pyridine.) 

“The high yields and enantioselectivties obtained from the coopera 
tive reduction of the model deste (Z)-1a encouraged us to investigate 
the cooperative reduction of other aryl diesters. We identified enzymes 
that preferentially reduce the E isomers of diesters Ib-f (Fig. 3a) in 
high yields and enantioselectvities (Supplementary Tables 4-8). These 
enzymes were then used in the cooperative reduction of the Z 
‘mers of 1b-Ld with l-16 or FMN as a photocatalyst in the presence of 
blue light. High yields of product were obtained from the cooperative 
isomerization and reduction of each ofthe diesters, including those 
containing electron-rich ((Z)-1b) and electron-poor ((Z)-Ld) aryl 
‘groups with either FMN or Ir-16 as photocatalysts and the enzymes 
‘YersER or XenB (Fig. 3a). The yields and enantiomeric excess (e..) of 
the products obtained from the cooperative reductions of the Z iso: 
‘mets of Ib-d were equivalent to those obtained from the enzymatic 
reduction of the E isomers of substrates 1b-d in the absence of the 
photocatalyst. This result suggests that the cooperative reduction of 
any £/Z mixture ofthese alkenes should give high yields of the reduced 
products and proceed with high enantioselectvity. This feature of the 
cooperative system was crucial to obtain high yields and enantiose- 
lectivities for the reduction of Le and If, which were synthesized as 
inseparable mixtures of E and Z alkenes. The cooperative chemoenzy: 
‘matic reduction ofa 62:38 mixture of Eand Z isomers of leafforded 2e 
in 74% yield and >99% e.., and the cooperative reduction of a 61:39 
mixture of E and Z isomers of Ifafforded 2f in 94% yield and 91% e. 
"The yields and enantioselectivites from the cooperative reduction of 
(£/Z)-e and (E/Z)-1f indicate that the reactions were stereoconver: 
gent. The enzymatic reductions of le and Ifin the absence ofa photo. 
Catalyst and light formed the reduced products in only 58% and 60% 
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Fig. 3 | Scope ofthe cooperative photoisomerization and reduction and 
‘comparison to sequential reactions, a, Scope of the cooperative reduction 
‘ofallkenes, Standard conditions: reactions were conducted with 0.5% 
fene-reductase, 1 U mil "glucose dehydrogenase, 1% fr-16 or 5% EMN, 

25 mM glucose, 0.2 mM NADP’, 50 mM Tris bufler at pH 7.5 and 

10% DMSO at room temperature in the presence of bie light. PS is 

the percentage of E substrate present after the Z isomer (or EZ isomer 
‘misture) was iradiated with light inthe presence of either 5% FMN or 


yield, which reflects reaction with only the Fisomer ofthe E 
(Supplementary Tables 7,8) 

To determine ithe cooperative reaction would give high yields and 
high enantioselectivities with alkenes other than diesters, the coopera- 
tive reductions of unsaturated compounds containing diverse com 
nations of functional groups were evaluated, The ene-reductases OPR1, 
“TOYE, OVE2, YersER, and SYEI preferentially reduced the E isomers 


mixture 


295% yt 95% 08. 
2170 Yat S28 fe) 
1% 1-16 for 24h. In the case of Ip, (E)-Ip was irradiated with blue light 
In the presence of FMN and lr-16 for 24h, and the percentage of (Z)-Ip 
‘vas determined. Deviations érom standard conditions areas follows “0.2% 
ene redictase; "1% FMI; "3% I-16, ‘the reaction was conducted inthe 
absence ofa photocatalyst in 5% DMSO, 95% 50 mM Tris buffer. Control 
experiments are included in Supplementary Tables 3-18, b, Comparison 
of sequential (1) and cooperative (simultaneous) (2) isomerization and 
reduction of cyanoacrylates Ii and Ij under standard conditions 


of alkenes 1g-o to form 2g-o in high yields and with high enantiose: 
lectivities (Supplementary Tables 9-17). The cooperative reactions of 
the Z isomers of Lg-o were then conducted with Ir-16 or FMN in the 
presence of blue light. Figure 3a shows results from the combination of 
a photocatalyst and an enzyme that generated the products in the high 

est yields and with the highest enantioselectvites fom the Z isomers 
of Ig-0, Good results were obtained for the cooperative reduction of 


1 2018 Springer Natu Limited Al hts eseved 


LETTER 


Fig. | Derivatization of enantioenriched products. summary of 
the previously reported and newly disclosed transformations of the 
products of the chemaenzymatic cooperative reduction of alkenes is 
"Showa. Reaction conditions for derivatizations areas follows: (1) 2a 

to 3a: 4.0 equiv. LiAlH, tetrahydrofuran, RT, Sh. (2) 2e to Bei 1 
trifluoroacetic acid-dichloromethane, 3h RT-ii, 17 equiv. triethylamine, 
dichloroethane, and 1.5 equiv. diphenylphosphoryl azide, 65°C, 2h 

1H, 0.10 equiv. Mo(O);Cls,ert-hutanol, 65°C, 1 10 min. (3)2g to 

3g: LA equiv: Cp:2r(H)Cl (Cp, cyclopentadiene), L0 equi. Zr: 
‘etrahydrofuran:dichloromethane, 3h, RT; (4) 2h to 3h was previously 
reported. (5) 2i to 3i was previously reported" (6) 20 3} was 
previously eeported', (7) 2m to 3m vas previously eported. (8) 2p to 
3p: 1:1 HsOH80¢, 120°C, 6h. 


f-cyano-c.-unsaturated ester (Z)-1h, o-cyano-0,-unsaturated esters 
(2)-tiand (2-1), amidoaceylate(2)-1g, and amidocyanat (Z)-1k. High 
Yields and enantioseletivities were also obtained from the reactions of 
cyanoketone (Z)-,-keto-«-unsaturated esters (Z)-Lmand (Z)-19, 
and ocketo-0,3-unsaturated ester (Z)-1o. The cooperative reductions of 
amidoacrylate (Z)-Igand amidocyanate()-1kare noteworthy because 
enzymatic reductions of alkenes containing Weinbreb amides have not 
been reported to our knowledge. Control experiments showed that 
(2)-Ul,(Z)-1m, and (Z)-Ln undergo enaymatic reduction to generate 
products in high yields and with high enantioselectivities in the 
presence of blue light but in the absence of an added photocatalyst 
(Supplementary Tables 14-16). In these cases, the alkenes (II-n) isomer 

{ae the presence of blue light alone (Supplementary Figs. 13, 22,23). 

"The cooperative enzymatic reductions of alkenes Ii and Ij shown in 
Fig. ab illustrate the benefits of a cooperative system over two sequen: 
tial reactions, The photoisomerization of Z)-Iiand (Z)-j with I-16 
oor FMN results in £/Z mixtures in which the less reactive Z isomer is 
the major component. Asa result, low yields were obtained from the 
sequential isomerization and reduction of (2)-Hand (Z)-I. The ee. of 
2i obtained from the sequential isomerization and reduction of (Z)-Li 
‘was slightly lower than the. obtained from the enzymatic reduction 
ofthe pure, more reactive isomer (E)-1i, and this difference probably 
zesults from the slow reduction of the Z isomer of H after rapid con- 
sumption ofthe £ isomer during the second stage of the sequential 
process (Supplementary Fig. 30). By contrast, the simultaneous, oop 
erative reduction of (Z)-Li and (Z)-1) generated products 2i and 2jin 
high yields and with high enantioselectivities. 

“An enzyme that selectively educes the E isomer of trifluoromethyl 
cyanate Ip in high yield and with high enantioselectivity could not 
be identified; however, OVE2 selectively reduces the Z isomer of Ip 
in this way (Supplementary Table 18). To determine ifa cooperative 
seaction could convert (E)-1p to-2p in high yield and with high enan- 
tioselecivity, the ability of (E)-1p to undergo photoisomerization in 
the presence of Lr-16 and blue light was evaluated. An 86:14 ratio of 


the £/Z isomers of tp was established after 24h of irradiation with blue 
light in the presence of l-16. Although the photostationary stat of Ip 
favours the less reactive E isomer, the cooperative reaction of (E)-Ip to 
2p occurred in high yield and with high enantioselectvity with I-16 as 
‘the photocatalyst and OYE2 as the reductase, This example illustrates 
an important benefit ofthe cooperative chemoenzymatic reaction: the 
system enables either the isomerization ofa Z alkene with simultaneous 
enzymatic reduction ofthe E isomer or the isomerization ofan E alkene 
with simultaneous enzymatic reduction ofthe Z isomer. The conversion 
of (6)-Ip to 2pis also noteworthy because ofthe dearth of reductions 
‘ofan alkene with isolated enzymes to generate a product containing a 
stereogenic centre substituted with a trifluoromethyl group. 

“To demonstrate the synthetic value ofthis new method, preparative 
scale cooperative reactions were conducted with 1% Ir-16 and the 
following alkenes: 40-60 mg of (Z)-1a, a 62:38 mixture of the Eand Z 
isomers of Le, (Z)-1g, (2)-Ih, and (Z)-Lo. Product 2a was isolated in 
87% yield and >99% e-.,2e in 79% yield and >99% ee, 2g in 71% 
yield and >99% e.., 2h in 96% yield and 92% ee, and 2o in 79% yield 
and 599% ec 

‘The enantioentiched compounds that were obtained from the coop- 
erative isomerization and enzymatic reduction system can be trans: 
formed into a variety of biologically active molecules and valuable 
synthetic intermediates (Fig. 4) For example, the selective hydrolysis 
ofthe tert-butyl ester in compound 2e followed by a Curtius rearrange- 
‘ment yielded 3e, a-amino ester. The 3-amino ester was isolated in 
{90% yield without a notable reduction in enantiomeric excess (98% 
ee). The selective reduction ofthe Weinreb amide in compound 2g 
with Schwart’ reagent yielded methyl 4-oxo-2-phenylbutanoate (3g) 
{in 74% isolated yield and 99% e. This compound isan intermediate in 
the synthesis of protein kinase inhibitors and microsomal triglyceride 
transfer protein inhibitors”. Acid-caalysed hydrolysis ofthe nitrile in 
2p yielded 4,44-trifluoro-3-phenylbutanoic acid (3p) in 96% yield and 
599% e1. This versatile synthetic intermediate has previously been used 
{nthe synthesis of inhibitors of beta amyloid production“, Reduction 
of 2a with lithium aluminium hydride formed 2-phenylbutane- 
I.d-diol (3a) in 939% yield and >99% ee. This dol isa synthetic pr. 
cursor to inhibitors of matrix metalloproteases™. Other products of 
the cooperative reactions are known precursors o biologically active 
compounds. or example, 2 and 2j have ben converted previously 
toy-amino acids (including baclofen and phenibut) and lactams! 
2h has been converted into a? amino ester and a7 lactam”, and 2m 
hhas been converted into calyxolanes, which are cyclic ether natural 
products*. Thus, the products of the cooperative chemoenzymatic 
reduction are precursors to valuable synthetic intermediates using both 
newly disclosed and previously reported transformations. 

‘The combination of photocatalytic process and an enzymatic reac- 
tion enables transformations that combine the reactivity of chemical 
catalysts with the selectivity of enzymes, Two features of photocataly 
reactions make them, in general, suitable for chemoenzymatic pr 
cesses: first, photochemical reactions typically occur at oF near room 
temperature, making them compatible with the thermal requirements 
‘ofenzymatic systems, and second, photocatalyst often react by mech: 
anisms, such as outer-sphere electron transfer or energy transfer, that 
involve intermediates that are stable towards water and the functional 
‘groups in proteins. These considerations, in combination with the 
renewed interest in photocatalyss and the rapidly advancing tools of 
‘molecular biology, should create opportunities for the development 
‘of a wide range of new cooperative chemoenzymatic transformations. 


Data availability 
Data supporting the findings of this study, Supplementary Figures 1-42 and 
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Marine heatwaves under global warming 


‘Thomas L. Frolicher!, Erich M. Fischer & Nicolas Gruber! 


Marine heatwaves (MHWs) are periods of extreme warm sea surface 
temperature that persist for days to months! and can extend up to 
thousands of kilometres®. Some of the recently observed marine 
hreatwaves revealed the high vulnerability of marine ecosystems™!! 
and fisheries"™ to such extreme climate events, Yet our knowledge 
about past occurrences! and the future progression of MHWs is 
very limited. Here we use satellite observations and a suite of Earth 
system model simulations to show that MHWs have already become 
longer-lasting and more frequent, extensive and intense in the 
past few decades, and that this trend will accelerate under further 
global warming. Between 1982 and 2016, we detect a doubling 
in the number of MHW days, and this number is projected to 
further increase on average by a factor of 16 for global warming of 
1.5 degrees Celsius relative to preindustrial levels and by a factor 
(of 23 for global warming of 2.0 degrees Celsius. However, current 
national policies for the reduction of global carbon emissions are 
predicted to result in global warming of about 3.5 degrees Celsius 


by the end of the twenty-first century", for which models project 
anaverage increase in the probability of MHWs by factor of 41. At 
this evel of warming, MHWs have an average spatial extent that is 
21 times bigger than in preindustrial times, last on average 112 days 
and reach maximum sea surface temperature anomaly intensities 
(of 2.5 degrees Celsius. The largest changes are projected to occur in 
the western tropical Pacficand Arctic oceans. Today, 87 per cent of, 
_MHWsare attributable to human-induced warming, with this ratio 
increasing to nearly 100 per cent under any global warming scenario 
exceeding 2 degrees Celsius. Our results suggest that MHWs will 
become very frequent and extreme under global warming, probably 
pushing marine organisms and ecosystems to the limits of their 
resilience and even beyond, which could cause irreversible changes. 

‘There is mounting evidence that global warming is leading to more 
frequent and intense heatwaves overland, increasing the risk of severe 
and in some cases irreversible impacts”. In comparison, we know 
‘much less about how heatwaves in the ocean unfold in time and what 
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3 
‘loval warming (6) 
Fig. 1 | Simulated changes in MHW characteristics for different levels 
of global warming. a, c-f, Results are shown forthe global aggregated 
sonal mean probability ratio (a) duration (c), maximum intensity (4), 
cumulative mean intensity (e) and fraction of attributable risk (fof 
MHWs exceeding the 99th preindustrial percentile. b, Ratio ofthe 
‘mean spatial extent at global warming conditions to that at 1861-1880 
conditions, fn all panels, the simulated MHW characteristics are 

plotted against simulated global mean atmospheric surface temperature 
changes since 1861-1880, The thinner lines represent individual model 


Global warming (2) 


z 
(Global warming (¢) 
projections, whereas the thicker lines represent mulli-model averages 

for the RCP 8.5 and RCP 2,6 scenarios, For all models, the historical 
simulations are merged withthe RCP 2.6 and RCP 4.5 simulations. The 
time series are smoothed with a 20-year running mean and the year labels 
represent the central year of two decades. The cumulative CO; emissions 
(orange: in gigatons of C) corresponding to diferent global warming 
levels ae showen in a, approximated using the RCP 8.5 ensemble average 
(see Methods). 
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“rend in probably ratio [per 35 yr) 


Fig. 2 | Observed and modelled trends in MHW characteristics over 
the satellite data taking period, a-c, The black lines show the observed 
changes in the global aggregated probability ratio (a), maximum annual 
intensity (b) and the ratio ofthe annual spatial extent at diferent years to 
that at 1982-2016 conditions (c) of MHWs exceeding the 1982-2016 99th 
percentile. The thick red lines indicate the simulated mault-model mean 
changes and the thin red lines the individual models of MHW's exceeding 
the 1982-2016 99th percentile, The observed global mean SST changes 
since 1982-2016 are shown in aas a back dashed line. d-f, The histograms 
show simulated 35-year trends of MHW characteristics inthe preindustral 


the associated impacts are. Although there isa rapidly growing litera- 
ture on individual events"! the underlying drivers and the degree 
to which they can be attributed to global warming!" are currently not 
well known, This knowledge gap is of considerable concern given the 
high vulnerability of marine ecosystems and fisheries, btalso human 
societies, to such events! 

One of the first documented impacts of an MHW was the 
‘Mediterranean Sea heatwave event in 2003, which led to extensive mor 
tality of benthic marine communities". Other prominent examples are 
the record-high ocean warming off the coast of Western Australia in 
early 2011”, the 2012 MHW in the northwest Atlantic", the persis 
tent 2013-2015 extreme warm anomaly of the northeastern Pacific” 
and the 2015/2016 record-warm anomaly across most ofthe tropical 
and extratropical oceans”, MHWs have caused changes in biological 
production, toxic algal blooms’, regime shifts in reef communities", 
‘mass coral bleaching" and mortalities of commercially important fish 
species", with cascading impacts on economies and societies"? 

Here, we detect past changes and assess future ones in different 
MHIW characteristics using (i) remotely sensed daily global sea surface 
temperature (SST) data” covering the period 1982-2016, and (i) daily 
‘output from twelve fully coupled global Earth system models (ESMs) 
covering the period 1861-2100 (see Methods). We identify an event as 
an MHW when the SST exceeds its local 9th percentile, as determined 
from daly data from either preindustial model output or from satelite 
based observations and model output over the 1982-2016 period. We 
then quantify the annual mean probability ratio (the fraction by which 
the number of MHW days per year has changed), relative change in 
the annual spatial extent (the average area ofan individual heatwave), 
‘maximum annual intensity (maximum exceedance ofthe 99th percen- 
tile), annual mean duration (number of days of exceedance) and annual 
cumulative mean intensity (the product of the duration and the mean 


“rand in maximum annual intensty (© ner 35 yr) 


“Tend in eatve change 
In annual spatial tent (per 35 yr) 
control simulations (see Methods for calculation details). The black and 
red vertical lines show the 35-year observed and simated trends in 1982— 
2016 of MHWs exceeding the 1982-2016 99th percentile, and the blue 
‘vertical lines show the 99th percentile (labelled as '99P’) ofthe probability 
density distribution of the preindstral control simulation trends, The 
relative changes in the anna spatial extent are calcalated asthe ratio 
between the actual mean spatial extent and the average over the 1982-2016 
period. Only simulations following the KCP 8.5 scenario are considered 
here because they best represent observed greenhouse ges emissions 

since 2006, 


intensity ofexceedance). We analyse three distinct periods: the prein 
dustrial period (Fig. 1), the satellite data taking period (1982-2016; 
Fig. 2) and the future (Fig. 1, 3). We focus on summertime MHWs 
(that is, hottest days of the year), as many biological processes depend 
on the absolute temperature. The definition of MHWs needs to be 
altered, however, when MHWs in colder months can have an impact 
‘on biological processes” 

In preindustrial times, the ESMs suggest that atypical MHW (with 
reference to preindustrial climatology) lasted 11 days (intermodel 
range, 6-14 days), had an intensity of up to 0.4°C (0.3-0.5°C) and 
a cumulative mean intensity of 3°C d (2-4°C d) (Fig. 1, Extended 
Data Table 1), MHWs occur coherently with a typical spatial extent 
of 4.2 x 10° km? (1.2 x 10°-7.0 x 10° km’), Under the present-day 
1°C global warming scenario, these models project anine-fold (6-12) 
increase in the probability of occurrence of an MHW and a three 
fold (1-3) increase in its spatial extent. Further, they project that the 
duration and the maximum annual and cumulative mean intensity 
have increased to 25 days (15-33 days), 08°C (0.6-1°C) and 13°Cd 
(8-18°C d), respectively 

These century-scale changes can be put into perspective by deter 
‘mining the trend that they imply over the 35-year period 1982-2016, 
for which we have satellite observations. To this end, we change the 
reference for the definition of MHWs to this period. This has virtu 
ally no impact on trend computation, but affects the magnitude of 
the MHW characteristics. Over these 35 years, the models simulate 
‘mean changes inthe probability rat mum annual intensity 
of 42.0 (.1-2.8), +0.07°C (0.01 >) respectively, and rel: 
ative changes in the annual spatial extent of +-0.53 (0.17~1.00) (thick 
red lines in Fig. 2). These multi-model mean trends are atthe high 
end or outside the range of those expected from internal variability 
(histograms in Fig. 2d, which is determined from the preindustrial 
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(Gobi warming (2) (Global warming 
Fig. 3 | Regional changes in probability of MHW days for different 
‘global warming levels. a-d, Changesin the probability of MW days 
‘Exceeding the preindustral 9th percentile for a global warming level of 
1°C (a),2 °C b) and 35°C (e). To show thatthe occurrence of MHWs is 
mainly driven by a simple shift ofthe whole temperature distribution, in 
‘we have added the local annual SST change thats consistent with a3.5 
slobal warming to the preindustial SST distribution. e-h, Changes are 
regionally aggregated over the western Pacific warm poo (e) the Arctic 
‘Ocean at >75" N (0 large marine ecosystems (g) and the Southern Ocean 
a1 45° $-65" S(h) Box plots indicate the multi-mode mean, minimum 
and maximum changes in probability and their colour indicates the value 


control simulations. Ths indicates that the climate change signal could 
be strong enough to be detected in observations. 

"The corresponding 35-year trends in the satellite observations (thick 
black lines in Fig. 24-f) are of similar magnitudes as the simulated ones 
(red lines in Fig. 24-f).'The observations reveal a significant increase 
in the probability ratio (+1.29++0.28 per 35 years; P< 0.01 using a 
two-sided f-test), maximum intensity (+0.15 + 0.05°C per 35 years; 
P-<0.01) and spatial extent (+-0.66 + 0.13 per 35 years; P< 0.01) (thick 
blacklines in Fig. 2). These observed trends ae statistically significantly 
outside the model-based estimate of tend variability arising from inter: 
nal variability, but are within the simulated intermodel uncertainty for 
tends arising from simulations that include anthropogenic forcing 
(thin red lines in Fig. 2d-f), Assuming that the model-based estimate 
of internal variability is accurate, we can conclude with high confidence 
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ofthe probability ato, using the same colour coding as for a-d. Black 
bars represent the multi-mode! minimum and maximum ofthe global 
averaged probability changes. In , the western Pacific warm pool region 
ishighlighted by a purple solid contour and the large marine ecosystems 
ae shown with black contours adjacent to the continents in coastal waters. 
‘he grey contours in a-d highlight pattern structures. The lage marine 
ecosystems provide 95% of the worlds annual marine Fishery yields" 

tnd have been developed to enable ecosystem-hased marine resource 
management within ecologically bounded transnational ares. 

“The maps were created using the NCAR Command Language 
(bueps//wwenel.ucar edu). 


thatthe observed trends in the MHW days, maximum intensity and 
spatial extent of MHWs are largely caused by long-term ocean wart 
ing. Support for ths conclusion comes from the fact that SST variations 
have also a large effect on the year-to-year variability ofthe different 
-MHW characteristics, The observed temporal evolution ofthe annual 
‘ean SST (black dashed line in Fig. 2a) has strong correlations with 
the probability ratio (r?=0.6 for global SST and probability ratio) and 
the spatial extent (/?=0.65) of MHWs, but relatively weak correlation 
with their maximal intensity (7? =0.36), 

“The satellite records allow us also to assess the characteristics of 
the modelled MHWs, allowing us to establish confidence levels for 
the projections, The modelled spatial patern in the probability rato, 
‘maximum intensity and the frequency distribution ofthe spatial extent 
‘of MHWs are comparable to the observed ones over the satellite data 


taking period (Extended Data Figs 1,2, giving us confidence in the 
cortesponding projections. By contrast, the duration, cumulative mean 

tensity and absolute spatial extent ofthe MHWs are less well captured 
by the models, with substantial biases inthe corresponding patterns 
(Extended Data Figs. 1, 2). This indicates that we need to he more 
careful when interpeeting the modelled changes of these characteristic, 

For the future and all ocean basins, the ESMs project more 
fiequent, extensive, intense and longer-lasting MHWs (Fig. la-e Fig. 3, 
Extended Data Tables 1, 2 and Extended Data Fig. 3) (here the reer- 
ence period is set back to preindustial times) The magnitude ofthese 
changes scales withthe global mean temperature and the cumulative 
CO, emissions that drive this global warming (Fig. 1). This scaling 
is independent ofthe warming path, that is, it does not depend on 
‘whether a particular warming is reached sooner (RCP 8.5, high- 
emission scenario; see Methods) or later (RCP 2.6, low-emission 
Scenario compatible with the Paris Agreement. I also does not depend 
on the reference period, asthe use ofthe satellite reference period 
would only shift chs curve slightly tothe left (Extended Data Fig a) 
‘This allows us to assess the future projections in terms of warming 
levels rather than the time when this warming is reached. 

For 3.5°C warming, the probability of occurrence of an MHW is 41 
times (Intermodel ange, 36~45 times) higher than in preindustrial 
times (Fig, 1, Extended Data Table 2). ln other words, a one-i 
hhundred-days event at preindustral levels is projected to become a 
cne-in-three-days event a this evel of global warming. The spatial 
extent ofthe annual mean is projected to become 21 (15-29) times 
larger, its duration to increase to 112 days (92-129 days),andits max 
‘mum intensity to rise to 25°C (2.1-29°C) (Fig. lb-d, Extended Data 
“Table 1), The projected increase in maximum intensity issmaller than 
the increase in global mean temperature owing tothe substantially 
lower rate of warming by the surface ocean compared to land. The 
increase in the duration and intensity also leads toa strong increase in 
the cumulative mean intensity of MHWs of 164°C d (126-214°C.d) 
(fig e, Extended Data Table 

These large increases in the diferent MHW characteristics ate sub- 
stantially reduced if warming is kept below 2°C, or even below 1.5" 
“The probability of occurrence for an MHW under the 15°C warming 
scenario only 40% ofthat under 35°C warming, The relative change 
inthe spatial extent of atypical MHW would be 25%, the duration 35% 
and the maximum intensity 45% of those at 35°C. 

The probability of MHWs is projected to increase almost every 
where, and the increase is largest inthe topics and the Arctic Ocean 
and smallest in the Southern Ocean (Fig. 3). The main reason for the 
large changes in probability in the topics, and especially in the westees 
Pacific warm pool, isthe small variations in SST in these areas, both 
seasonally and from year to year, As a result, the same changes i 
annual mean SST lead to much larger changes in the probability of 
eaceeding the 99th percentile, The same applies to the Arctic Ocean, 
‘where SST variations below year-round sea ice are very small This is 
in contrast tothe Southern Ocean, where surface waters are projected to 
stay relatively cool and therefore the probability ratio does not increase 
smuch under all warming levels. The projected increase in the probabil 
ity oFMEIWs in the coastal large marine ecosystems (indicated as black 
coastal regions in Fig 3d) has similar magnitude othe global increase 
under 2°C warming 

‘Because ofthe large increase in the probability ratio with warming, 
the simulated fraction of attributable risk—that is, the anthropogenic 
contribution tothe probability ofan event—reaches 087 (0.78-0091) 
already under a present-day level of 1°C warming (Fig. If, Extended 
Data Table 2). This implies (under the assumption thatthe models sim 
ulate naturally occurring MHWs with fidelity) that 87% ofthe currently 
occurring MHW (defined telatve to preindustral conditions) can be 
attributed to global warming. Because this warming is primarily driven 
by anthropogenic emissions of greenhouse gases, there is a direct 
link between human action and the simulated increase in MHW's this 
supports our conclusion drawn from the satelite data. Clearly, any spe 
cific MHW event still arises from the natural variability inthe climate 
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system, but the present-day level of global warming has substantially 
increased the odds of an MHW to occur. The simulated fraction of 
attributable risk approaches unity (0:94-0:97) already at 2°C, implying 
that essentially all MHWs are due to anthropogenic warming at this oF 
higher levels of warming, 

"The changes inthe occurrence of MHWs are mainly driven by the 
tlobal-scae shiftin mean SSTS. We demonstrate tis by adding the sim 
‘lated spatial warming patter that is consistent with a global warming 
of 35°C to the results from the preindustrial control run (Fig, 3d) 
‘This yields probability ratio values and patterns that ae similar to the 
resus from the transient simulations. Italso implies that changing the 
reference period would not change the relationship between the diffe: 
ent MHW characteristics and the amount of warming (Extended Data 
Fig. da), A notable exception isthe northern Arctic Ocean, where the 
SST remains close to freezing temperature during boreal winter months 
even under the RCP 8.5 scenario. This lightly damps the increase in 
the probability ratio that would be expected from a global-scale shift 
in the mean SST. 

‘An important assumption in our analyses s that the employed ESMs 
simulate MHWs in a sufficiently realistic manner. We consider our 
results for the probability ratio, maximum intensity and the relative 
changes in the spatial extent of MHWS to be robust, especially given 
the good agreement with observations (Fig. 2 and Extended Data 
Fig 1) and the relatively small intermodel spread in MHW projections. 
However, the simulated MHWs last generally longer and are spatially 
more extensive than observed ones (Extended Data Fig. 1,2), which is 
probably caused by the relatively coarse resolution of the ESMS. High 
resolution coupled models are needed to resolve mesoscale processes 
in the atmosphere and the ocean that may be critical to improve the 
representation ofthe duration and spatial extent of MHWs. In addition, 
the conclusion that global warming will ead to astrongiincrease in all 
MHIW characteristics does not depend on how an MEW is defined 
(Extended Data Fig. 5), but the quantitative results of MHWs can vary 
substantially with that definition, 

‘An increase in MHWs will probably increase the risk of severe, 
pervasive and long-lasting impacts on marine organisms", especially 
fon those with reduced mobility and high vulnerability, such as coral 
reefs, and those living atow latitudes, where many marine species live 
close to their upper thermal limits”. However, the responses of marine 
organisms and ecosystems to MHWs can be variable and difficult to 
predict owing to species-and system-specific responses”. Therefore, 
better understanding of the response of marine organisms and eco 
systems to MHWs and extreme events in other stressors is urgently 
needed to assess the full risk for marine organisms and ecosystems 
under global warming. 
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METHODS 
‘We analyse daly SST and surface atmospheric temperature data fom simulations 
(using the fst ensemble member ritpt) of weve coupled ESMS that were con. 
sidered inthe fith phase ofthe Coupled Model Intercomiparison Project (CIPS) 
an for which the output necessary to analyse changes in dally SST was avalble 
(Extended DataTable). All mode simulations were un oer the historiea 1961 
2005 period and over the 2006-2100 period. fllawing both a high-emission sce 
nario (RCP 8.5: RC, representative concentration pathway) anda low-emission 
scenario compatible with the Paris Agreement (RCP 2.6). 

Tmaddition, weuse the National Oceanic and Atmospheric Administrations 
diy optimum interpolation ST dataset" version 20, obtained the Advanced 
‘Very High Resolution Radiometer and covering te period 1 January 1982 to 
31 December 2016 (wwwnedenoaa govfosst/;accesed on 6 uly 2017). The 
dataset combines observations from diferent platforms, such as satellites, ships 
and buoys, and includes bias adjustment of satelite and ship observations to 
compensate for platform diferences and sensor biases For comparison withthe 
oarse-eslution models the 0.25" 0.25" satelite data were egridded daly onto 
‘regular 1" grid by averaging over the I-ged cells before calculating the 
characteristics ofthe MEW. 

‘Wie define an event as an MHW when the daly SST exceeds the 9h percentile 
(acne-n-achundred-days event) We est the senstviy of the ess by lo using 
the 9oth (a one-in-ten-days even) the 99th (a one-in-2 74-year event) and 
‘he 99.95th (a one-in-27 years even! percentiles (Extended Data Fig. 5). The 
percentiles at calculated foreach grid cell from mult-centenial preindustal 
onto simulations (most simulations are for 500 years rlongee). This ensures 
thar even thelocal 99 99th percentile is well defined. The same preindustial con 
tral simulation is used to define the rference global mean temperature relative 
to which the warming targets are computed, Changing the reference period to 
present-day (thats, 2007-2026; +10 years centred on today) would us shit the 
“alues on the xaxs in all panels of Fg. 1, but would not change the relationship 
betwen the diferent MEW characteristics and the amount of global warming 
(Extended Data Fig 43) Because some models have constant year-round SSTs 
inafew grid cells undersea ce inthe preindustral contol simulations, ged cll 
in which the average yearly number of MEW over the entire contol simulation 
deviates by more than 5 rom the theoretical amber (or example, 3.65 day for 
‘he 9th percentile) are masked out. Fr the analysis of atmospheric heatwaves 
overland, we use the same definition as for MHWs, For analysis over the satelite 
data taking period, we use the entire 1982-2016 period a the baseline period for 
both the models and the satelite data 

‘The usage af percentile-ased threshold allows the quantification of MEW 
cross locations that differ in variability. An absolute threshold would only be 
relevant in terms of impacts in some regions but notin others. By using per 
Centile-based characteristics, no asumplion ismade reganling the underlying 
probability temperature distribution, nd potential model-observaton biases in 
the mean and higher-order statistical moments ofthe probability temperature 
distribution are implicitly taken into account. This increases our confidence in 
the simulated probability ratio, but thesimalated spatial extent and dation (and 
intensity) of MEBWs may stil fer from observations. Our definition differs 
from tha proposed by Hobday etal, who define an MEW by using a much 
lower seasonally varying percentile threshold (90th athe than 99th), but impose 
duration of teat ive days Relative to the results obtained wih ou definition, 
the definition of Hobday eal. would lead to an increase ofthe number af heat 
wave days, including the cad seasons, because the vast majority of our heatwaves 
Tas longer than five day. However. using thei definition would not change our 
oncusion about the robust increas in all METW metrics under global warming, 
because this result i essentially insensitive tothe percentile chreshold that we 
choose (Extended Dat Fig 5) 

For cach MEW, we calculate a series of characteristics, such asthe duration 
(in days number of day of percentile threshold exceedance), the maximum 
intensity (in °C; maximum SST anomaly with respect to the percentile thresh 
ald over the duration of the heatwave), the spatial extent (i km‘), the eum 
lative mean intensity fin °Cd: the mean intensity multiplied withthe duration 
af an event), and the probability ratio, PR PP, where P's the probability 
of exceeding a celative threshold at any given point in time for example today) 
anu, the probability of exceeding tha threshold during the preindusteal contol 
or satelite climatological period. The cumulative mean intensity may indicate 
the integrated impact ofan MEW on an organism's helth—s similar measure, 
the degree heating days or weeks, commonly used to identify areas where sub 
stantial coral bleaching is kel to occur. We then calculate annual statistics, 
inclading the number of MEW days per year the changes inthe annual averaged 
spatial extent ofan MEW lative to 1861-1880 oF the satellite climatologial 
Period, the annual mean duration of single contiguous MEW events ina given 
Jer, the maximum annual intensity the annual mean cumulative mean inten: 
Sy and the annual mean fraction of atributable risk (FAR=1 ~ Py)!" 
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Because the observed global warming primarily results fom human influence, 
‘we ean atrbute the changes inthe accurrence of MHIWs to human-induced 
_lobal warming, The FAR wa initially introduced to representa fraction of the 
probability f individu observed evente™", Here, we extend the FAR framework 
tothe global scale to represent the probability fora clas of events exceeding 
{certain threshold ver the globe. For a given MH; the probability ratio can 
be interpreted asa change inthe adds ofthe occurrence of local SST anomalies 
exceeding certain local threshold. The regional r global aggregated probability 
ratio expresses changein the global ocurtence of SST anomalies exceeding local 
thresholds 

‘Wecalculate the MEW properties by using slightly diferent frameworks. 
‘The probability ratio, maximum intensity, duration, cumulative mean intensity 
and FAR are defined when local (id-cell SST exceeds the local 9th percentile 
and where adjacent grid cells can have different values. The intensity and dura- 
tion refers tothe properties af contiguous event, but the probability rato refers 
to MHW days per year, regardless of how they are distributed across diferent 
events. For the spatial extent, we aggregate adjacent grid cells that are above the 
pth percentile together to frm single event. To calculate the spatial extent of 
individual MEWs, we isolate the individual MHWs per day using the function 
Kime monsure label of the Python mage processing tool sckt-image. The global 
‘estimate of these characteristics iscalculted with an atea-weighted average across 
allocean grid points in each yea rom 186 to 2100. Al MHW characteristics are 
‘alculatedon the native model rid, which dflersin resolution across the models, 
>but muli-model means and globally aggregated characteristic ate calculated and 
shown on aegular I 1° grid. 

‘We usually express the changes in MHW characteristics ax changes for par- 
ticular global warming levels (thats, °C, 2"Cand 35°C), These global warming 
levels are calculate for each mode and scenario individually by subtracting the 
simulated global annual mean atmospheric surface temperature averaged over 
‘the 20-year period centred around the year when the respective global warming 
levels reached, fom the simulated global mean atmospheric surlacetempersture 
averaged over the 1861-1880 period 

“The cumulative CO, emissions corresponding othe diferent global warming 
levels orange ticks on horizontal axsin Fig 1a) are approximated using calculated 
cumlative CO, emissions from the RCP 85 average of eight models fr which 
necessary data were available (Extended DataTable 3). Tis means that 500 Gt 
corresponds to 16°C, 1,00 Gt C to 28°C. 1.500 Gt Co 40°C and 2,00 Gt 
to4.9°C. Notuncertainties are assigned to these values, We aot that these eight 
siodels a clase have a relatively low transient climate response to cumulative 
carbon emisions and therefore cumulative carbon emission estimates fora certain 
‘lobal warming level are relatively large”, 

‘Totes whether the observed mult-decadal wends over the satellite data taking 
period ace diferent from what would be expected fom internal variability, we 
compare the observed global aggregated trends in the probability ato, maximum 
intensity and spatial extent withthe probability density function of 35-year long 
trends detived from the 12 multi century contro simulations ofthe dierent ESMs. 
In total, we calculated 6.460 35-year trends 

‘Weealo used a ten-member ensemble simulation of the NCAR-DOE CESM 
mode! to show that internal variabity may induce uncertainty a the local level, 
bat playa negligible role in explaining the global changes in the different MEW 
‘charactersties, all ten realizations yield very similar results (Extended Data 
Fig). We also show that our simulated changes inthe MHW characteristics do 
not dependon the choice of calculation method forthe climatological 9th percen- 
tls. tn fact the simulated changes in the MHW characteristics are similar when 
determining the local 9th percentiles from a simulation ofthe GFDL ESM2M 
‘odel forced with observed solar and volcanic boundary conditions but with 
_greenhouse gases and aerosols concentration eto preindustral levels (Extended 
at Fig 6)-Only the NCAR-DOE CESM and GFDL ESM2M models provide the 
dally output necesary to analyse the sensitivity ofthe results to internal variably 
and tothe calculation ofthe climatological baseline period. 

‘Weed the western equatorial Pacific biome definition of Fay and McKinley" 
to highlight the western Pacific warm pool region in Pig, 3. The biogeographical 
biomes ret, are defined by distinct SSTs, maxiaum mixed-layer depths and 
summer chlorophyll concentrations and capture patterns of lrge-icale biogeo- 
chemical function atthe basin scales. 

‘Under any level of warming, MHWs are projected to accur much more fe- 
«quently than land-based heatwaves (Extended DataTable 2). The probability of 
[MESW days is about times higher than that for lan-based heatwave days under 

°C global warming, even though the global SST are projected to increase by 
nly 0 55°C per degree of surface air temperatare warming overland (Extended 
Data Fig 7). The larger increase im the probably ratio is obtained because the 
temperature variably overland is much ager than aver the ocean” leading to 
4 smaller signal-to-noise rata This is evidenced by the 80°C diference between 
the 99th percentile and the annual men air temperature averaged over the global 
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land surface at preindustril times, which s much ager than the 3.7°C difference 
over the ocean 

{In general, the probability ratio, and therefore also the FAR, increas the most 
for very rare extremes (Extended DataTable 2} thats they increase much more 
AEMHWs are defined with more extreme preindustrial percentile thresholds 
(Extended Data Fig. 5)-For example, the probably ratio is 9 intermodal range, 
6-12) for moderate MHWs (defined withthe 9th preindustrial percentile) and 
1A (47-296) forthe rarest MEW (99 99th preindustrial percentile) under 1°C 
lobal warming. 
(Code availability. The code used to prduce the Figures inthis paper 
from the cortespaning author upon request 
Data availability. The CMIPS data used for this study canbe accessed at hip!) 
emi ll gov/ and the satelite SST observations ean be accesed at www.ned. 
‘noaa govioistl. Other datasets generated during the curent study are available 
from the corresponding author upon reasonable equest. 
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{2 Satote: Arnal cumulative moan ions (C499) 


Extended Data Fig. 1 | Observed and simulated MHW characteristics 
exceeding the 1982-2016 99th percentile, averaged over the 1982-2016 
period. a b, Differences between the 9h percentile in SST and the 
anaual mean SST. ,d, Anaual mean duration of MHWs.,f, Maximum 
annual intensity of MHWs.g.h, Annual cumulative mean intensity of 
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1b cMIPs: 99 pacentie in SST- moan SST ("C) 


-MHWs. Satellite derived patterns (a, c,¢,g) and CMIPS multi-model 
mean patterns (b,d, fh). The black contours in all panels highlight the 
pattern structures The spatial correlation between the CMIPS mut 
‘model mean and the satelite-based estimates is" =0.80 for aandb, 
P05 for eand d, = 0.43 fore and f, and ?°=0.18 for gand h. 
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‘Sie of marine heatwave (kn) 
Extended Data Fig. 2 | Spatial extent of observed and simulated MHWs 
over the satellite data taking period. a, Histogram ofthe spatial extent 
ofsatelite-observed MHWs above the climatelogical (1982-2016) 9h 
percentile for the 1982-1998 (hue) and 1999-2016 (ced) period. b, The 
spatial pattern of the MHW with the largest extent in the satelite data 
taking period (1982-2016), hich occurred on 2 September 2015 in the 
[North Pacific and was associated with the ‘the lob" t had a spatial 
extent of about twice the area ofthe United States (that is, 185 % 10" km) 
Shonen are SST anomalies above the 1982-2016 climatological 99th 
percentile on 2 September 2015, The colour bar shows degrees Celsius. 
«Comparison between satelite-based observations (black line) and 
simulations (red histogram) ofthe spatial extent of MHWs above the 
climatological 9th percentile over the 1982-2016 period. The number of 
-MHWs (y axis) is normalized withthe total number of MHWs. Deeper red 
colour indicates a greater number of overlapping models 
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Extended Data Fig. 3| Simulated multi-model mean changes in uration (b, eh, k) and annual cumulative mean intensity (c, £,i,I) of 
different MIW characteristics exceeding the preindustrial 99th -MHWS for global warming of 1 °C (a-e) 15°C (4-1), 2°C (g-i) and 


percentile since preindustrial times for different global warming levels. 3.5°C(j-1) The black contours highlight the pattern of changes. 
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Extended Data Fig. 4| Simulated changes indifferent MW 
characteristics exceeding the preindustral 99th percentile, The data 
‘were obtained from a 10-member ensemble simulation with NCAR-DOE 
CESM"a-d, The probability ratio (a), annual mean duration (b), 
‘maximum annual intensity (e) and annual cumulative mean intensity (4) 
‘of MHWs. The black lines show the individual ensemble members. The 
‘ed line ina shows the probability ratio versus global warming for 3 


Global warming (°0) 

reference period that i defined asthe 9th percentile aver the 2007-2026 
Period, obtained using ll ten ensemble members, The ensemble members 
ae initialized from different stating points (ocean, seaice, land and 
‘stmosphere) in the preindustrial control simulation, The simulations 
follow the RCP 8.5 scenario aver the 21st century. The time series are 
smoothed with a 20-year ranning mean, 
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Extended Data Fig. S| Simulated changes in MHW characteristics for 


different global warming levels and different extreme thresholds. 
44,¢-e, Global annual mean probability ratio (a: logarithmic scale), 


duration (c), maximum intensity (d) and cumulative mean intensity (e) of 
MHWs versus different extreme thresholds for different changes in global 


‘mean surface sir temperature. b, Changes the rato of the mean spatial 
extent of MHWs between global warming conditions and 1861-1880 
conditions, Simulations following only the RCP 8.5 scenario are shove, 
‘The shaded areas indicate the maximum range simulated by the CMIPS 


models. 
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Extended Data Fig. 6 | Comparison between simulated changes in 
MHW characteristics exceeding the 99th percentile from a natural- 
forcing simulation and from a preindustrial control simulation using 
GEDLESM2M forced with the RCP 8.5 scenario aver the 21st century. 
a-d, The probability ratio (a), annual mean duration (b), maximum 
Annual intensity () and annual cumulative mean intensity (d) of MHWs. 
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“There line shovrs the simulated changes exceeding the 99th percentile 
from a natural-orcing simulation of GEDI. ESM2M forced with observed 
solar and volcanic boundary conditions, bt with greenhouse gases and 
‘aerosol concentrations set to preindustril. The blackline shove the 
simulated changes exceeding the preindustrial 9th percentile. The time 


series are smoothed with «20-year running mes. 
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Extended Data Table 1 | Simulated changes in the annual mean spatial extent of MHWs relative to preindustrial times, and simulated 
annual mean duration, maximum and cumulative mean intensity of MHWs exceeding the preindustrial 99th percentile for different global 


warming levels 
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Extended Data Table 2 | Simulated probability ratio and fraction of attributable risk estimates averaged over the ocean and over land for 


different global warming levels and for different preindusti 


percentile thresholds (99th and 99.99th) 


90 percentile 


Probability ratio 


90.98" percentile 


Warming Ocean 


Land 


‘Ocean 


CO 8.94 (6.70/12.15) 

15°C 15.65 (11.34/24.48) 
2c 22.80 (16,31/30.76) 
35°C 41.19 (96.92/44.91) 


Fraction of attributable risk 


5.56 (3.44/8,75) 
9.71 (6.63/13.48) 
18.89 (10.71/18.59) 
26.50 (22.36/90.26) 


141 (47/296) 

418 (197/079) 
293 (448/1679) 
2560 (2094/2918) 


Warming Ocean Land ‘Ocean 
TC 0.87 (0.787051) 0.78 (0.65/0.87) 0.88 (0.8470.88) 
18°C 0.88 (0.9010.96) 0.88 (0.83/0.92) 1 (0.991) 

2c 0.95 (0.94/0.97) 0.92 (0.90/0.95) 1 

35°C 0.97 (0.9710.98) 0.96 (0.95/0.96) 1 
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Extended Data Table 3 | Global climate models us 


‘Modal 
‘CanESM2 (800)" 
CSIRO-Mk3-6-0 (500) 
GFDL-CMS (500) 
GFDL-ESM2G (500)* 
GFDL-ESM2M (500)* 
HadGEM2-ES (500) 
IPSL-CMSA-LR (1000)* 
IPSL-CM5A-MR (300)" 
MIROC-ESM (630)* 
MPI-ESM-LR (1000)* 
MPL-ESM-MR (150) 
MRI-CGCMS (500) 
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Categorical perception of colour signals ina 


songbird 


Eleanor M. Caves", Patrick A. Green", Matthew N. Zipple', Susan Peters!, Sonke Johnsen! & Stephen Nowicki" 


In many contexts, animals assess each other using signals that 
vary continuously across individuals and, on average, reflect 
variation in the quality ofthe signaller'. Itis often assumed that 
signal receivers perceive and respond continuously to continuous 
variation in the signal Alternatively, perception and response 
‘may be discontinuous’, owing to limitations in discrimination, 
categorization or both. Discrimination is the ability to tell two 
stimuli apart (for example, whether one can tell apart colours close 
to each other in hue). Categorization concerns whether stimuli are 
grouped based on similarities (for example, identifying colours 
with qualitative similarities in hue as similar even if they can be 
distinguished)', Categorical perception is a mechanism by which 
perceptual systems categorize continuously varying stimuli, making 
‘specific predictions about discrimination relative to category 
boundaries. Here we show that female zebra finches (Taeniopygia 
{guttata) categorically perceive a continuously variable assessment 
signal: the orange to red spectrum of male beak colour. Both 
predictions of categorical perception’ were supported: females 
(1) categorized colour stimuli that varied along a continuum 
and (2) showed increased discrimination between colours from 
opposite sides of a category boundary compared to equally 
different colours from within a category. To our knowledge, this 
is the first demonstration of categorical perception of signal-based 
colouration ina bird, with implications for understanding avian 
colour perception and signal evolution in general. 

First described for the perception of phonemes in human speech, 
categorical perception was later shown to function in the perception 
of auditory signals in other animals”. With regard to colour, animals 
‘may discriminate among colours but nevertheless treat them as 
similar", and colour categorization!” may influence decision- 
‘making thresholds. Thus, animals can categorize colours, forming 
discrete groups of similar yet discriminable variants across the visible 
spectrum. However, the hallmark of categorical perception! !*— 
increased discrimination of variants between categories relative to 
‘ariants from within—has not been demonstrated for natural variation 
in colour-based signals, 

Carotenoid-based colouration is commonly used in visual signalling 
across many taxa although its function in assessment signalling is best 
described in mate choice in birds and fish'”**. Individuals vary in their 
ability to acquire and metabolize” carotenoids; therefore, variation in 
carotenoid-based colouration has been linked to variation in the quality 
of the signaller. Carotenoid-based beak colouration in male zebra 
finches ranges from light orange to dark red”, beak redness correlates 
positively with variation in cell-mediated immunity", and females 
show a mating preference for males with red versus orange beaks”. 
Previous studies have tested how receivers respond to both ends of a 
carotenoid-based colour continuum, but whether they perceive varia- 
tion continuously (esponding differently to any detectable differences 
incolout) or exhibit categorical perception is unknown, 

To create stimuli that vary continuously alonga spectrum from red 
to orange, we selected eight Munsell colours (Pantone) previously used 
to describe the colour of zebra finch beaks***. We modelled chromatic 


distance (AS) using the receptor noise-limited model of colour 
discrimination” (Extended Data Tables 1, 2). To ensure that the 
selected colours are approximately equidistant from one another 
when accounting for zebra finch spectral sensitivity"* and ambient 
light (Extended Data Figs. 1, and Supplementary Information), we 
visualized AS ina chromaticity space" in which the Euclidean distance 
between stimuli plotted in an x-y plane equals chromatic distance for 
a trichromatic viewer (Fig. 1). 

‘We used a food-reward protocol to test for categorization (which is 
sometimes referred to as labelling®) and discrimination of the eight 
stimul spanning the orange-red colour spectrum. We created discs of 
‘Munsell paper comprising two semi-circular halves of either the same 
or different colours (hereafter ‘solid and ‘bicolour, respectively). Once 
females had been trained to lip over these dies to acces food reward, 
we trained them to flip bicolour discs irs, before any solid discs: in 
essence, birds earned to recognize bicolour' versus solid discs, rather 
than particular colour combinations. In experimental trials, we pre 
sented females wit a foraging grid that had twelve wel, six of which 
were covered with discs: two solid discs each fr each of the two colours, 
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1 | Munsell colours used to create stimuli, Colours were 
approximately equally spaced in chromaticity space and were closer 

to their nearest neighbour than to any other colour Dots show mean, 
chromaticity coordinates for each colour ellipses show one standard 
deviation in the X, and X> dimensions (14 measurements per colour 
Supplementary Information); numbers between dots show chromatic 
distance (A) between colours (mean + sd). Ellipse colour corresponds 
to relevant Munsell colour (exact colours in the figure may vary). Inset, 
foraging grid ofan example tral 
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Fig.2 | Categorization trials suggested a boundary between colours 5 
and 6. The boundary is indicated by vertical lines onthe x axis. Females 
‘were 31% more likely to pass 16 than I[s trials and 34% more likely to 
pats 58 than 6[s trials,» 26 birds in three independent cohorts. Box 
plots depict median (horizontal line inside box), 25th and 75th percentiles 
(box), 25/75th percentiles 1.5% interquartile range (whiskers), and 
outliers (circles). Horizontal grey lin indicates the expected pass 
frequency if birds Nip discs by chance. 


and two bicoloue discs comprising the same colours asthe solid discs 
(Fig. 1 inset). Birds passed atrial ifthey flipped both bicolour discs 
before any solid discs, indicating that they perceived the two colours 
cn the bicoloue dise as different 

‘We first performed categorization experiments to establish the 
location of potential perceptual boundaries. We tested females using 
bicolour discs that included colour 1 in combination with all other 
colours (that is, 12 13 1}4 and so on) and, separately, colour 8 in 
combination with al other colours that is, 87, 8[6, 85 and so on). We 
determined the proportion of trials passed for each comparison. For 
both the I[X and 8[X comparisons (where X is any other colout), pass 
frequency increased when the chromatic distance increased between 
colours 1 or 8 and the comparison colour X. The greatest difference 
occurred, however, when comparing the pass frequencies for 15 
and I[6, and likewise between 8|5 and 8)6, suggesting that there isa 
potential boundary between colours 5 and 6 (Fig. 2). A linear mixed 
‘model (Table 1) demonstrated that comparing colour 1 or 8 with the 
colour immediately preceding versus immediately following the puta- 
tive boundary resulted inthe same change in pass frequency as would 
‘moving 10.5 AS units, approximately equal to three colour steps within 
category (mean AS between colour steps= 3.6). 

‘We next determined whether discrimination of colour differences, 
crossing the putative 5~6 category boundary was increased compared 
to discrimination of equal colour differences that did not cross the 


Table 1 | Mixed model results fr categorization and greyscal 
discrimination data 
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Colour Trtereapt Ooi 

see Ast 002 78 0.0001 
‘Across5-Sboundary 024 308 ‘0.0001 
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Fig. 3 | Discrimination of stimuli that are two colour spaces apart 
increased across the 5-6 boundary. a,b, Mean pass frequency across all 
individuals (a) and foreach individual (b) was greater for comparisons 
that crossed the boundary versus those that did not. a, Green boxes in the 
agrey-shaded area are comparisons that cross the boundary. Sample sizes 
(numberof birds, across three independent cohorts) are shown within 
cach box. Michelson contrasts shown in parentheses. Boxes, whiskers, 
Circles and horizontal grey line areas described in Fig 2. 


boundary. We presented females with colour pairs that did or did not 
cross the hypothesized boundary, and that were two colour spaces 
apart (that is, 13 24, 315 and so on: Fig. 3a). The pass frequency for 
comparisons that crossed the 5-6 boundary was 26 + 6 percentage 
points higher (mean + sd.) than comparisons that did not (paired 
‘test, fas =9.26, P< 0.0001; Fig. 3b). We found similar results when 
stimuli were one or three colour spaces apart (Extended Data Figs 3,4) 
and when we combined all categorization and discrimination data into 
single linear mixed model (Extended Data Table 3) 

Ourcolour stimuli varied in brightness because real ebra finch beaks 
of different colour are also not of equal brightness. Several lines of evi 
dence support the conjecture that brightness alone does not completely 
explain our results. First, we built a linear mixed model comparable 
to the one presented in Table 1, but which—in addition to chromatic 
distance (AS)—ineluded Michelson contrast (a measure of brightness 
ratios) between colour pairs to explain pass frequency instead of the 
binary variable indicating the 5-6 boundary. This model performed 
substantially worse (A Akaike information criterion = 13) than the 
‘model that included the binary parameter of crossing the boundary. 
Additionally, our data show that several discrimination comparisons 
had either similar Michelson contrasts but different pass frequencies 
{46 compared to 6|8 (Fig. 3) and 5)6 compared to 67 (Extended Data 
Fig. 3)), or equivalent pass frequencies with different Michelson con: 
trast (4)6 compared to 5|7 (Fig. 3) and 3)6 compared to 47 and 5]8 
(Extended Data Fig. )). Lastly, we performed one-apart discrimination 
experiments (n= 8 birds) using shades of grey (Extended Data Fig. 5 
and Supplementary Information) selected to match the red-orange 
colours in zebra finch double cone" quantum catch (Extended Data 
‘Table 1), an estimate of perceived brightness in passerines™ In these 
greyscale experiments, pass frequency was significantly predicted by 
‘Michelson contrast: birds demonstrated increased discrimination for 
both 5|6 and 6{7 compared to greyscale pairs with a lower contrast 
(Table 1 and Extended Data Fig 5). Together, these pieces of evidence 
suggest that brightness may contribute to category formation but cannot 
alone explain the perceptual categories that we observed. 

(ur results demonstrate categorical perception of colour associ- 
ated with an assessment signal. We found that a category boundary 
influences the perception of two colour stimuli as similar or different, 
and results in differential discrimination between stimuli depending 
‘on their location relative to the category boundary. Discrimination 
improved with increasing chromatic distance between colours (that 
is, variants within a category were not perceived as identical), but 
increased most sharply across the category boundary (that is, variants 
from across the boundary were perceived as most differen). 

‘We did not explicitly test whether categorical perception arises a the 
level ofthe photoreceptor or higher, such asin the brain. However, a 
‘wavelength discrimination function derived from electroretinographic 
data from the pigeon Columba livia", a reasonable proxy for the zebra 


finch, suggests that photoreceptor sensitivity alone probably does not 
explain categorical perception (Extended Data Fig 6) Wealso did not 
directly test female preference for males in relation tothe categories 
found here. Nonetheless, these findings have important implications 
for the understanding of colour perception and encourage further work 
exploring whether and how categorical perception interacts with selec 
tion on signal form and function, particulaely in the context of assess- 
sment (Extended Data Fig, 7). 


Online content 
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METHODS 
“The goal ofthis study was to test whether female zebra finches perceive colours 
long a ed-to-orange spectrum in a ontinuous or categorical fashion, We selected 
‘ght colours from red to orange that correspond to male ebr inch beak colours 
‘nd tha are roughly equidistant from ane anther in achromatic space based 
‘on zebra finch spectral sensitivity. These eight colours were made into disc stimu 
at were either solid in colour (that is, made of two halves ofthe same colour) 
or bicolour (that is, one colour an one half ofthe disc and another colour on 
the other haf) Disc stimuli were used in a food-reward protocol, Female zebra 
finches were frst trained to flip over discs covering wells ina foraging rio find 
4 food reward (millet seeds as used previously), using diss made [rom the col 
jursat the endpoints of our red-orange continuum (colour land). Initially all 
well (both solid and bicolout) were baited with mile seeds to reward the birds 
any time they sucessfully Mipped asc. Afterbeing trained on this task, females 
were further trained to flip anlybicolour discs; we used the same stimuli made 
‘ofcolours 1 and 8 but this ime baited only wells that were covered with bicolor 
diss. Once females passe six out of seven consecutive trials under this protocol, 
Indicating they had learned to search for food oaly under bicoloured discs, we 
then condacted trials using diferent combinations of our eight selected colours. IE 
females perceived two colours as distinc, they would flip the bicnloured discs first 
to gana food reward: if they didnot perceive two colousas distinc, they would 
‘not preferentially lip the bicoloured discs first Far deta, see'Behavioural rials 
[irds used in this study. Al bids in this std were sexually mature female zebra 
finches (age range: 3-50 months at start of experimental testing) from a colony 
‘maintained by R. Mooney at Duke University ((ACUC A2S8-L4-10) After trans- 
fer from the colony, birds were housed singly in cages (6 23 23cm? Prevue 
Pet) outfitted with two wooden perches anda cutlebone- Lighting wascantrlled 
during tals (see Behavioural tals) and food was removed Sh before trials to 
‘ensure that birds would be motivated to perform the task. Outside of trial times, 
birds were kept ona 15h lightdarlecyele (consistent with the light cele af the 
bird home colon), with overhead lighting provided by lurescent bulbs (Ecolux 
ith tacoat SP 35/41, colour temperature 3.500—4,100K, General Electric) with 
ballast (Hi-Lume 8D/co-10, Lutron Electronics) operating at 50-60 Hz. Birds 
‘were given zebra finch food (Kaytee Fot-Diet Pro Health Finch diet) and water 
‘libitum. Rooms were maintained at 25-27°C. Testing was dane under Duke 
University LACUC protocal AOO4-17-01. 
Selecting stimulus colours, Previous workkhas shown thatthe range of beak cal 
‘uration in zebra finch males can be represented by red and orange shades in 
the Munsell colour system", Munsell colours ae defined by three parameters: 
hue, value (brightness) and chroma (saturation). Previous work an zebra finches 
Identified a large set of Munsell colours that, bythe human eye, approximate the 
colours of bea finch beaks. In particular, this set of Munsell colours consists of 
‘colon with hues 625R-3:75VR. values 3-6 and chroms 10-14, Notably the 
‘values used in these previous studies spanned a range of hues rom yellow-orange 
tered, and specifically didnot use Munsell colours the same brightness because 
eal male beaks of diferent hues ate also of diferent brightness. 

“These Mansel cloure?" are based on beak colours across two diferent clony 
popolations of zebra finches and capture most ofthe variation within those 
populations, although the authors ofthe studies note thatthe beaks of occasional 
Individuals were found to be outside that range. Because the goal afour study was 
toexamine how female zebra finches perceive a range of colours that te similar to 
the range spanned by the colours of maleebra finch beaks, we used this previous 
‘work asa starting point to choose 40 Munsell colour samples from within the 
set outlined above. Relecance spect from each colour sample were measured 
using an integrating sphere with sbult-intungsten-halogea light source (ISP-REF, 
(Ocean Optics). All measurements were taken with reference toa Spectealon 99% 
white reflectance standard (Labsphere). 

For each ofthe Mansell colours, we calculated relative photon catches (that 
{s,how many photons ae absorbed by a given cone type in response toa visual 
scene") for zebra finch short-, medium- and long-wave cones. Photon catches 
‘were calculated over the wavelengths 400-700 nm, using zebra finch spectral 
sensitivity curves™ an ambient light spectrum and the rellectance spectrum of 
each colou Thus, photon catch Q for eceptor ype rin response to colour was 
calculated sing 
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Q, 


‘where isthe sensitivity of receptor type r, i the reflectance of colour ¢, 
denotes the wavelength and is the irradiance ofthe illuminant. We use pro- 
portionality throughout, because we did not require an absolute quantum catch, 
nd the constant factors remain the same across different receptor types. At 
for our measure of ambient light, we used a standard tungsten bulb luminance 


spectrum (CIE tlaminant A, coloue temperature 2.856% spectrum 
in Supplementary Information), whichis ery smo the spectrum ambient 
lightprovidedby he tungsten usin our experimen blow Extended Data 
ig and Sopplementary Information). Weed sensy data rm he bra 
finch calculate photon catches fo the short medium and longwave and 
double cone (Extended DataTable 1) We didnot cleulte photon etches or 
‘heulravletconebeesi I) males finch eas ret minima evil 
igh) emale rca inches donot equi adatontdcrminate 
and rank red-orange colouration) our own measurement confirmed that 
“Mansel paper doesnot reflet strongly nthe lava portion the specu, 
and (heeled lrvilet adance from or Mune chip sin under 
‘sperm ghting conditions, wax esertally eo (ee Extended Data ig 2). 

Tn tess of xeric perception stn shuld be ough eqn om 
neaneterin prepa pace Noni on fll dec he perspec 
‘human colour vison et aloe that oa banc Therefore to caclt the 
Chromatic ditance between colour stm we ed the receptor noel 
(GNI) model oclour dcriminaton®, which uses phot catches alate 
fromthe petal entity of eevant nil stem calust AS, mere 
afchromateditance between colours (ealent to just-noicebl ier 
ces, INDs) We vislized AS using perceptually ifr, two dimensions 
‘pace bed on both he an atrtonshroma. in which the Elden stance 
‘en two colours isequialnt tothe RNL model derived chromate distance 
(3S) between colours Potting chromatic distance this two-dimensional 
‘pace sonlyrevant fo tchromatic son, which nou cae was appropiate 
{Bren that we didnot incorporate quantum catches fam ultraviolet ones ee 
Shove) The RNL mode!” states that color discriminate primarily mite 
inyphotoecptor nos thova acidean dance of Dine chromatic pace 
iappronimatly equal to one sandra devin of eepor noe or one FD. 
Given that hit method es photon catches from snl ones (hor sed 
tndlong wave we performed apt analy oxamin ered brightness 
tried on double canes (ce blow) 

summary thigh color that we usd (se Soplementry nemation fo 
rflecance peta) were boson previo publatedcomprisons wth male 
Tobe ch belo nx peony ely peced ina Charo pee 
thse on nb finch pect esti Throughout, we fro these cours as 
The darkest ed end ofthe rangs) through (the brightest. orange end ofthe 
range) 
Sent anases. To ensure thatthe eight colors that we selected are nat 
tepdatent fn ena ypace ply vg tothe etc pruonters of the 
‘ehrafinch spect senses andthe tungsten ighting conditions that we sed 
‘readin oted the selected colours in chromatic space using diferent 
Condon (all spec ap sensitiv curves ate provide nthe Sappementary 
Infomation nclading th vbr finch lev igh ete (UY) cone ype 
sin ander bot (1) tungsten ap (2) ayight (D6) laminas) the bra 
finch UVS cnetype tia sing he pcrum halogen ight presentin the exper 
‘mental roms (the era iach UV cone ype retin alter spying von Kees 
‘Moptton based ona neural greybackyround™ to account for colour consany 
‘mechani (5) another UVS coe ype retina val system the startng Stra 
ari) (6) the average UVS cone type etna ts rom evil bled 
‘opplementary online material), and (7) the average vilt light-sensitive 
(6) ee ype rita rom previo pushed supplementary online mse 
ab!) which the othe primar ype of tina found in bids (Extend Data 
“ale Ove chromate ances bern sete Marl clues remained 
‘lati cone een when changing te pect enya ihn ci- 
tiny ar wel as when accountng fr colour cnatany wth ie distance tween 
teu Socyerbeingthelere jp Tine chm Hnnceelone cae epi 
thecatgory boundary that we observed. 
‘Behavioural ria Rom s-p. Bidewer housed in nviualcge, wih op 
to a birdein ach room. Before tras we laced opaque aria Betwe= ad 
‘tot cages so tha bids could not se ter oss ghboursperiom any ask 
Individual were ale to cote ids conte room, akhough they cold nok 
sec the tak tha birds were performing setup prevented bids om seing 
iow other bids were posing the as while ensuring that hey could sc other 
Sind which was beneficial given hater inches are high socal Ding ial, 
theron rethend lights were tured efand bids were alowed apronmately 
1Ominto asia ta the experimenting. Ding experiential ech 
age was laminated frm above bya alogen lan (lou temperature 2900. 
‘del number HEPC-6136, Philips Lighting approximately cm fom the 
forging eri peste in Suppiemeatry Information). The ight passed hog 
tellum paperhng fm om thelight ours ensure that each eage had eve 
dfn illumination (steed Data ig) Alleria wee recorded video 
{togtech webcams Po 900, Lage) On wil days, fa wos removed rom the 
ages st 0 and ral ean a 1400, to esr tat ids wold be motte 
toarempr he tk 


ise design. Our experimental simul were discs2.5em in diameter, made of two 
‘semicircles of Munsell colour sheets, glued with their straight edges together 
to create a fll circle. The dacs were covered with a lear epoxy cover. A clear 
vinyl disc (13m diameter, Sem high) was attached tothe bottom ofthe disc 
toensute tft precisely into the wells of te foraging grid, We created two types 
of discs: solid, in which both semi-circular halves were the same colour and 
bicolour.in which the two semi-circular halves were made of different colours. 
Bicolour discs were named for the colours oftheir two halves—that i, a disc 
‘made up of halfcolour I (the far red) and half colour 8 (the far orange) would 
heraferredtoas [8 

“The experimental forging grid consisted of two grey plastic locks (13.5 Sem 
and 25cm high) positioned adjacent to one another. Each block contained six 
‘dential wells(13em diameter, O31 deep. Birds fist learned to search fr food 
beneath the discs in ive stages. in the ist stage, we placed millet seeds in four 
randomly selected wel, with no discs present In the second stage four disc (two 
bicolout and one each of thet slid colours that comprised the bicolour discs 
halves) were on the grid, adjacent othe baited well Inthe third stage, dises were 
placed halfway covering the baited wells In stage fou, discs were placed upped 
into the ited wells so ata de the seed hut with discs ited only loosely into the 
well In the fith and fina stage, discs were placed to completly cover the baited 
ell sa bird could only acces the sced by flipping the dsc of ofthe well using 
itsbeak For each ofthe training tas, success was defined as obtsining the seeds 
from any ofthe baited wells within a20-min period. A subject had to pass three 
consecutive trials ofeach stage to progress tthe nex sage. 

‘Once the birds had learned the basic task f searching for food under the 
discs, they were trained to associate only bicolou diss with a food reward, a 
‘stage that we call bicolour association: Zebra finches were trained in bicolour 
association using total of six discs: two 18 bicolou discs (which were baited), 
and two solid discs cach of colour | and 8 (which were no baited). Importantly. 
tur behavioural data show tha birds learned to recognize bicolour vets sold 
discs, rather than particular colour combinations of bcolour diss, as shown by 
their ability to extrapolate the colour-based tak to a greyscale task (described 
below). Trial lasted fo 2min during which time birds were allowed to fip discs, 
and we recorded the arr ofthe first two discs that they fpped. Fllseing the 
2-min observation period, th gid and any remaining millet were lft n for up 
to 20min. To pasa bicolour association tral, irde had to flip over both biol 
out (1[8) diss before either sold disc was lipped within the 2-min observation 
Period. Bids had to pase sx out of seven consecutive trials before we deter 
‘mined that they had learned the task. after which they progressed to data collec: 
tion. tan individea! failed more than one raining trial out of seven, they were 
ven additional traning trials until the pass criterion was reached or until we 
determined the bird was unlikely to learn the task. In total, 26 out of 30 birds 
(87%) that reached the bicolour association trial passed and went on to data 
collection trial 

Data cllecton trials involved the same grid and 1-well set up asin traning. Six 
wells were covered with dics (two bicolour fur solid), and only the two bicolor 
disce were baited, Weusedthe sample’ function in Ro cesta set of randomized 
locations for each disc in each trial upto 6 rl foreach colour combination Discs 
were placed in haphazard orientations tat, the direction ofthe ine bisecting the 
‘middle was nt consistent from one tral tothe net). During data collection, birds 
wereallowed up to2minto flip diss after which time the grid and any remaining 
seed were removed, We recorded both the arder in which the frst twa dises were 
Aipped, aswell asthe latency to fipping the fst disc. Observers were not ind 
to thetasks: however upto thee of ive (usualy one a two) observers collected 
data ona given day and no single observer consistently collected data fora given 
teal fr example, 56,1). Results were consistent across the ive observers who 
collected data during the experiment. 

‘We conducted 3-5 trials per bird per day- Overall, we collected data fram 26 
birds (rained in three independent cohorts of n=10,1=6 and n =10) over the 
cours of7015 experimental trials In 10.1% of trials (1=712), birds Mipped ether 
fo discs or ane disc and then stopped flipping dics. In these cases, we did not 
count these trials as failure but rather removed the tral from the dataset. Most 
‘commonly bids didnot ip any dice or only flipped one disc other birds inthe 
room suddenly went quiet orf they were started by the behaviour of another bird. 
Removing these incomplete trials from the dataset ensure that we didnot bias our 
data towards an increased fare ate. At the end ofeach tril day, we performed 
‘motivation check on each ied by returning the ied’ food dish to the cage and 
fbserving for upto 10min to ensure that they would eat seeds, Birds typically 
hogan eating within 30s of having ther food returned, indicating that individuals 
sill had high feeding motivation even atthe end of our trials 
Categorization experiments, Categorization experiments tested whether birds 
responded ina continuous or discontinuous manner to colour variation, identi 
fying the location afboundaries if they were present. As inal tras, we presented 
birds with two bicolour discs and two ofeach solid colour dis (six discs total 
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spread randomly across 12 posible wells on the grid). Each day, a bird was given 
ane refresher tral (1/8 bicolour and 1 and 8 sli) and five experimental trials of 
4 single colour combination. The next day the bind would receive relrsber trial 
and ive rials ona different colour combination, and so on. Colour combinations 
‘sed in categorization trials included the combination of colour ane and all other 
colours (thats 12,13, 14...) as wellas the combination of colour eight and all 
other colours (that is, 8/786, 8...) Additionally, as control forthe posblty 
that birds used olfactory cues to locate seeds under certain discs we performed If 
and as trials to check that, inthe absence of any variation in visual information, 
birds would not perform beter than chance at Mippng discs placed on baited well 
Birds were shown colour combinations nan order that alternated relatively distant 
colour combinations (those that were five to seven steps apart) with relatively 
‘loser colour combinations (those that were one to four steps apart). Each bied 
was given two experimental days (10 toa rials) with each colour combination, 
and one experimental day (5 total tril) with each ofthe I and 8 comparisons 
“The second experimental day foreach combination occurred only afer the st 
experimental day forall other combinations had occured (that i, experimental 
days fora specific colour combination dd not immediately repeat) 

Preliminary data analysis identified a putative houndary between colours 
Sand. To analyse these data, we calculated the proportion oftrals that each bird 
passed fr each compatison (15 total comparisons per bird) Using ony data om 
‘ategorzation trials, we bulta linear mixed model (throughout, linear mixed 
‘models were calculated using R package Ime4) that included pass rate as its 
response variable, Euclidian chromaticity distance (4) between the two colours 
being compared and whether the comparison crossed the putative 5-s boundary ss 
fixed fics, and bird ID asa random effet. The modal included random slopesin 
dition to random intercepts, to account fr variation among birdsin the strength 
ff the eect of crossing the 5-s boundary, 

‘We visually inspected the residuals ofthis model using a quantile-quantil plot, 
histogram anda scltrplo of residuals versus predicted values to conficm that 
the residuals ofthe model were approximately normally dsrbuted around zero 
and that they were homoscedasic 
Discrimination experiments. A second requirement for demonstrating categorical 
perception is that subjects show increased discrimination hetween stim that 
‘rosea category boundary compared to equaly spaced simul that do aot ross a 
boundary. Thus, for discrimination trials, bicoloured diss were made of colour 
pairs that were ane space apart (thats 12,2), 4. Extended Data Fig 3), two 
paces apart (thats 1/32} 3). Fig. 3) or thee spaces apart (that i 14,2), 
36. Extended Data Fig.) The experimental set-up and criteria fr passing were 
the same as for categorization trials For each bird, mean pas rates were calculated 
forall discrimination trialsthat dd no crosthe putative boundary and, separately, 
forthose that dd. Wethen calculated the dllerence between these means foreach 
bird nd used a two-sided t-test to determine whether the mean diference in pass 
Fate was significantly different from zero. 

Combining categorization and discrimination data in a single statistical mode. 
“Topresent our data inthe mos easily interpreted format, we presen separate anal 
yes for categorization and discrimination trials in the main text. To bolster ut 
fonclusons, maximize statistical power and contain our analysis ina single model, 
‘we bualt a linear mixed model that includes ll discrimination and categorization 
data As ported inthe main text, this model contains random effects ofbird 1D 
and includes crossing the 5-6 boundary asa random slope. As inthe main ext 
significance tests were performed using ANOVA comparisons. The elt ofthe 
single model can be found in Extended Data Table 3 

GGreyscale discrimination experiments. The se of eight Munsell colours that we 
used inthis experiment were selected primarily based on their colour: in patica- 
lar. the colour ofthe Munsell colours aligned with thse of actual beaks and were 
ually spaced ina chromaticity space that describes the hue and saturation of 
Colours and from which brightness has been factored out However, rightness 
isaleo an important pat of how colour is perceived To examine the elfct that 
brightness has on structuring zebra finch perception af colours, we performed a 
second experiment in which we use shades of ey that matched our eight Mansell 
colours in brightness as perceived by zebra finches. 

“To select appropriate shades a grey we used an integrating sphere to measure 
reflectance spectra fom se of 72 grey paintswatches (Behr brand, Behr Process). 
‘We then calculated the quantum catch of the zebra finch double cone foreach 
of our eight Munsell colours and fr each of the geey paint swatches the double 
cone is believed tobe the primary way in which bids encade brightness infor- 
‘mation, By finding shades of grey that matched each Mansell colour in double 
cone quantum catch, we lected a et of eight grey shades that were equivalent in 
bid: perceived brightness tothe colour stm that wehad been using. 

‘Using ths set of eight shades of grey, we created experimental discs for the 
1]s-geey and eich ane-apart combination of grey shades. We followed the same 
experimental procedureasaboveto examine the birds’ discrimination ability when 
hue information had been removed (Extended Data Fig 5) 
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Wayclength discrimination function, To investigate whether the cate- 
jgorical perception that we observed could be solely due tothe wavelength 
discrimination function (WDE) of avian photoreceptors, we examined 2 
WDF that was derived using electroretinography in the pigeon C livia", We 
know of no experimentally of behaviourally derived WDF for zebra finches. 
‘The pigeon sa reasonable substitute, however given thatthe spectral sensitivity 
peaks fr its medium- and long-wavelength sensitive cones (505 and 565m, 
spectively) are very similar to those ofthe zebra finch (507 and 565m, 
respectively) 

‘We platted electroretinography-detived WDF data" alongside the 
spectral sensitivity peaks ofthe zebra finches (Extended Data Fig. 6) The units 
ofthe WDE are arbitrary units in keeping with the original publication, but 
show the general pattern of wavelength discrimination, One complication 
{s that iti not possible to know precisely where inthe visible spectrum the 
avian perception of ‘orange’ and ‘red’ would accur, However, in humans, 
orange and red both occur above 580m, inthe region in which stimulation 
'sprimanly ofthe long-wavelength cone and secondaiy the mediuay-wavelength 
cone, The WDF showed a disriminahity pea approximately 0m, followed 
by relatively smooth decrease in disriminablity in the region of the spectrum 
Jn which avian viewers probably see orange and red (Extended Data Fig 6). 
‘The shape ofthese curves suggests that photoreceptor sensitivity on its own 
probably does not explain categorical perception. From this analysis alone, how 
fever, we cannot rule out that hue discrimination based on spectral sensitivity 
curves contributes to categorical perception. Ultimately, whether categorical 
perception arses at the level ofthe photoreceptor or retin, ota higher-order 
process doesnot affect our findings or interpretation, but suggests avenues lor 
future esearch. 


ata reporting. No statistical methode were wed to predetermine sample size. 
‘Theexperiments were not randomized and the investigators were not blinded to 
allocation during experiments and outcome assessment 

‘Reporting summary, Further information on experimental design savalablein 
the Nature Research Reporting Summary linked to this pape. 

ata availability. The datasets generated and analysed daring the curren study ane 
svallble in the Duke Data Repository: htps:/doLory/10.7924/r4r=96199 
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Extended Data Fig. 1 | Downwelling vector irradiance at the level ofthe standard deviation in either digection. The orange line illustrates standard 
foraging grid, Units are photons per cm: per nm. Tungsten bulbs were Illuminant A, which we used throughout for viseal modelling because itis 
used tlluminate each cage from a set distance. There was some variation a standard spectrum (and thus repeatable by other researchers) and closely 
in radiance between cages, The be line represents the mean absolute matched the ambient lighting in our room, 


irradiance of our halogen bulbs, and the grey-shaded region indicates one 
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Extended Data ig 2| Reflected radiance of experimental stimuli under spectrometer, suggesting that the ws ofa trchromatic mode (versus 
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‘Below 400 nm, the values are so low that they reach the noise floor ofthe 
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Color comparison 


Extended Data Fig. 4 | Results from three-apart discrimination trials, 
a,b, Results show data acrossall birds (a) and for each bied individually 
(by), Bass rate was significantly higher forthe three comparisons that 
crossed the boundary (36, 4|7 and 5|8; green boxes, grey-shaded area) 
than for those that did not (1/4 and 2[5; blue boxes, white-shaded area) 
(paired t-test, f5=6.07, P< 0.001). Numbers in parentheses inside boxes 
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aT ae Ne Yes 
re.40) (p43) ies 
cross boundary? 

are numberof birds that participated in each type of comparison. Box 
plots depict the median (horizontal line inside the box), 25th and 75th 
percentiles (box), 25th and 75th percentiles +1.5% interquartile range 
(vehiskers) and outliers (circles) The horizontal grey line represents the 
‘expected pass rate if birds ip discs by chance. Numbers in square brackets 
Aare Michelson contrast values for a given coloue pai. 
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Cray comparison 
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Extended Data Fig. 5 | Results from greyscale (that is, hue information in parentheses below each comparison are Michelson contrast values, 


removed) one-apart discrimination experiments. These greyscale Linear mixed models showed that, in our greyscale experiments (that is, 
experiments did not replicate the categories that we observed when hue _ without chromaticity information), Michelson contrast between greyscale 
information was inchided, indicating that categories are not structured __ pairs significantly predicted pass rate. Ths finding is consistent with the 
based on brightness alone. Hox plots depict the median (horizontal line possibilty that category formation may be the result of bth chromaticity 
inside the box), 25th and 75th percentiles (box), 25th and 75th percentiles and brightness information (see Table 1). Sample size was 18 birds forall 
‘#1.5% interquartile range (whiskers) and outies (circles). Numbers comparisons. 
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Extended Data Fig. 6 | Wavelength discrimination of avian clectroretinography-derived wavelength discrimination function are from 
photoreceptors. The wavelength discrimination function of the pigeon _Riggs etal" and have heen inverted so that higher numbers indicate 
livia (black dashed line) plotted agains the spectral sensitivity greater discrimination. 


peaks of the zehra finch (green, red, hue lines). Original data for the 
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Extended Data Fig, 7 | Schematic illustrating differences between zebra finches, receivers show enhanced discrimination of variants across 
continuous and categorical perception. Under continuous perception __a boundary (hash marks on x axis) compared to equally spaced variants 
(Golid line), receivers perceive and respond in a continuous fashion ‘within a category. Zebra finch line drawing by N.Silina licensed wnder a 
to signal variation, meaning that any change in a signalling trait is Attribution 4.0 International (CC BY 4.0) licence (http /supercoloring. 
associated with a concomitant change in receiver response. Under ‘com/pages/zcbra-finch). 


categorical perception (dashed line), such as described here for female 


LETTER 


Extended Data Table 1 | Photon catch values for the short-, medium. 


ind long-wave cones of the zebra finch 
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Extended Data Table 2 | Chromatic distances between selected Munsell colours under different spectral sensitivities and lighting conditions, 
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Extended Data Table 3 | Results ofa single linear mixed model 
containing ll labelling and discrimination data 
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Ecosystem warming extends vegetation activity but 
heightens vulnerability to cold temperatures 


Andrew D. Richardson", Koen Hufkens!, Thomas Milliman', 


Donald M. Aubrecht!, Morgan E, Furze’, 


Bijan Seyednasrollah'*°, Misha B. Krassovski”, John M. Latimer, W, Robert Nettles’, Ryan R. Heiderman’, 


Jeffrey M. Warren’ & Paul, Hanson® 


Shifts in vegetation phenology are a key example of the biological 
effects of climate change'™. However, there is substantial uncertainty 
about whether these temperature-driven trends will continue, or 
whether other factors—for example, photoperiod—will become 
‘more important as warming exceeds the bounds of historical 
variability*®, Here we use phenological transition dates derived 
from digital repeat photography* to show that experimental whole- 
ecosystem warming treatments’ of up to +9°C linearly correlate 
with a delayed autumn green-down and advanced spring green-up 
of the dominant woody species in a boreal Picea-Sphagnum bog. 
Results were confirmed by direct observation of both vegetative 
and reproductive phenology of these and other bog plant species, 
and by multiple years of observations. There was little evidence 
that the observed responses were constrained by photoperiod. 
(Our results indicate a likely extension of the period of vegetation 
activity by 1-2 weeks under a‘CO, stabilization’ climate scenario 
(42.6:£0,7°C), and 3-6 weeks under a‘high-CO, emission’ scenario 
(45.94 1.1°C), by the end of the twenty-first century. We also 
observed severe tissue mortality in the warmest enclosures after a 
severe spring frost event. Failure to cue to photoperiod resulted in 
precocious green-up and a premature loss of frost hardiness’, which 
suggests that vulnerability to spring frost damage will increase ina 
warmer world”. Vegetation strategies that have evolved to balance 
tradeoffs associated with phenological temperature tracking may 
be optimal under historical climates, but these strategies may not 
be optimized for future climate regimes. These in situ experimental 
results are of particular importance because boreal forests have 
botha circumpolar distribution and akey role in the global carbon 
eyele!!, 

In temperate and boreal regions, rising temperatures are advanc 
ing spring onset (for example, budburst and flowering) and delaying 
autumn senescence (for example, leaf coloration and leaf fall)". 
Whether these trends will be maintained is an open question* 
‘Warm and cold temperatures, photoperiod and insolation, and 
precipitation and water availability have all been shown to influence 
plant phenology". However, the future response of phenology 
to rising temperatures still remains largely unknown because of the 
high degree of uncertainty associated with interactions among these 
drivers". Importantly. ithas previously been proposed that photoperiod 
‘may constrain the phenological esponse to rising air temperatures!" 
Although there is evidence for this in some species*, the generality of| 
these results—and whether there are robust patterns across functional 
types—has yet to be demonstrated’ 

‘Analyses of observational datasets to disentangle the effects of these 
drivers are challenged by the lack of variability in natural systems, the 
inherent correlation among drivers and the realism of space-for-time 
assumptions! Experimental approaches are thus required. However, 
there are sizable challenges associated with conducting realistic 
environmental manipulations, particularly for ecosystems with tall 


‘vegetation, Because of financial, logistical and technological hurdles, 
experimental warming treatments have not previously been applied to 
forest stands, and have only rarely been applied to single mature trees”. 

Although experiments with seedlings and branch cuttings ae relatively 
common", artefacts associated with these approaches may limit their 
broader applicability?" 

‘We have been studying the effect of experimental whole-ecosystem 
‘warming treatments on vegetation phenology at the ‘Spruce and 
Peatland Responses Under Changing Environments (SPRUCE) facility, 
long-term, multi-factor manipulative experiment situated ina boreal 
peatland forest in the Upper Midwest of the United States”. To our 
knowledge, this experiment is unique in thatthe five levels of warming 
{from 0 to +9°C, see Methods, Extended Data Fig, 1, Supplementary 
Note 1, Extended Data Table 1) are being applied to intact communi: 
ties of native plants, including woody shrubs and mature trees. The 
dominant plant species a SPRUCE represent key genera that are found 
across the vast boreal forest (taiga), which covers much ofthe land 
surface ofthe Northern Hemisphere from 45° to 70° N. Knowledge of 
the environmental controls on the phenology ofthese species is poor 
and does not at present provide a strong basis for making predictions 
about the capacity for phenological tracking of a warmer climate. 
Results from SPRUCE will therefore inform our understanding of the 
effects of climate change on processes related to biogeochemical cycling 
and biosphere-atmosphere feedbacks for this globally extensive biome. 

Cur focus here ison the effect ofthe experimental ecosystem warm- 
{ng treatments on spring and autumn phenology in this forested peat 
bog, Specifically, we tested three competing hypotheses: frst, that 
temperature is the dominant control on phenological events (hereafter 
referred toas H1). This hypothesis predicts thatthe observed phenolog- 
ical transition date is directly related tothe degree of warming (Fig. 1). 
Second, that photoperiod is the dominant control on phenological 
events (hereafter referred to as H2). This hypothesis predicts that the 
observed phenological transition date is constant regardless of the 
degree of warming (Fig. 1). Third, that photoperiod constrains the 
phenological response to temperature (hereafter referred to as H3). 
‘This hypothesis predicts thatthe observed response to temperature is 
flat beyond. threshold temperature, t* (Fig. 1). 

‘We tracked phenological responses tothe experimental treatments in 
‘two ways. Since August 2015 we have monitored the vegetation within 
each enclosure using digital repeat photography (Fig. 1 d,e), and since 
April 2016 we have made weekly ground observations of vegetative and 
‘reproductive phenology on a variety of plant species. 

For our analysis of camera imagery, we distinguished between 
three distinct vegetation types: an evergreen conifer, Picea mariana 
(black spruce); a deciduous conifer, Lari laricina (eastern tamarack 
or larch); and a mixed, ground-level shrub community dominated 
by Rhododendron groenlandicum (Labrador tea) and Chamaedaphne 
calyculata(leatherleaf). For each vegetation type, green-down—as 
determined by Gc, colour index derived from the digital images—in 
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‘Air tomperature 
Fig. 1 | Testing competing hypotheses for phenological responses to 
‘warming using data from a whole-ecosystem warming experiment. 

‘c, Conceptual model of relationship hetween temperature and 
vegetation phenology illustrating three competing hypotheses. 

ture isthe dominant control (H11).b, Photoperiod isthe 

dominant contol (H2).c, Photoperiod limits the temperature response 
above the temperature threshold *(H3).d,e, Sample digital camera 
Imagery showing the inside of plot 19 (unheated control enclosure) (4) 
and plot 17(-9.0°C warming treatment enclosure) (e) on 6 April 2016. At 
the time the photographs were taken, the ar temperature was 5°C in, 
plot 19 (note the last snow ofthe season), compared to 14°C in plo 17. 


autumn 2015 was delayed with increasing warming (Fig. 2a-c).'The 
response to warming was significantly stronger (interaction effect 
between temperature and species, P< 0.001) for the mixed shrub com- 
"munity (about 5 days delay per 1°C warming) than for either of the tree 
species (1-2 days delay per 1C warming), but was in all cases highly 
Lineat. Our results unequivocally support Fil; thats, that temperature 
is the dominant control on the timing of autumn phenology. The fact 
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thatthe temperature sensitivities were in all cases significantly different 
from 2er0 allows uso reject H2. Inno case di our breakpoint analysis 
(see Methods) identify a r value that substantially improved model ft 
(Extended Data Table 2), allowing us to reject H3. The above results 
are for autumn 2015, and comparable results were observed in autumn 
2016 and 2017 (Supplementary Note 2) 

Similarly, green-up in spring 2016 was advanced with increasing 
warming (Fig. 2d-f). The response to warming (1-2 days advancement 
per 1°C warming) was not significantly different among vegetation 
types (interaction effect between temperature and species, P=0.34) 
As in autumn, the fact that the temperature sensitivities were signif 
icantly different from zer0 allows us to reject H2. Breakpoint model 
analysis allowed us to reject 13, as in no case was at value identified 
that would improve model fit (Extended Data Table 2) In spring, as 
in autumn, HI is best supported by the experimental results. Results 
in spring 2017 were generally consistent with those for spring 2016 
(Supplementary Note 2), 

‘The above results clearly indicate a continued extension of the period. 
of vegetation activity in response to future warming. By combining 
downscaled climate projections (Extended Data Fig 2) from CMIPS} 
with the phenological temperature sensitivities estimated from Fig. 2 
(Supplementary Note 3), we predict that the physiologically active sea 
son ofthe two conifer species may be extended by about a week undera 
“COr stabilization climate scenario (representative concentration path 
way (RCP)4.5, +2.9-+0.7°C), and up to three weeks under a‘high COs 
emission’ scenario (RCPS.5, +-5:9+ 1.1°C) by the year 2100 (Extended 
Data Table 3) Active season extension for the shrub layers projected 10 
be roughly twice as large as that ofthe conifers. These results are judged 
to be entirely plausible, given that future warming is not projected to 
exceed the levels of experimental warming at SPRUCE and that we are 
thus not extrapolating into unsampled climate space. 

Previous work has shown that the seasonality of Gcc is a robust 
proxy for the seasonality of vegetation photosynthesis in both conifer 
forests and wetland ecosystems®=*, and thus earlier plant green-up 
and delayed green-down at SPRUCE are almost certainly associated 
witha longer photosyrthetically active period, and probably associated 


© Shnub layer 


Sepmtsnsossape'e 


Z 


Sing groen-up 


Sepeussgaedee te 


een-480 020 aper1 6 


‘0 
a 36 8 6 


Plot temperature (a7, °) 


Fig. 2| Effect of whole-ecosystem warming treatments on dates of 
autumn green-down and spring-green up, as derived from digital 
camera imagery. a-f, Response of autumn green-<dovn (a-e, 2015) 

and spring green-up (@-f, 2016) phenology to experimental warming 
treatments for L larcina, ? mariana and « mixed shrub layer community 
dominated by R.groenlandicum and C. calyculata, based on observations 
across n= 10 experimental enclosures (n=9 for Larix, asin one 
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Pot temperature (AT, °0) 
enclosure this species was not within the camera field of view). Green- 
<down and green-up are proxies for autumn senescence and spring 
onset, respectively. Error bas indicate 95% confidence interval around 
‘estimated phenological transition dates, Additional results are presented 
in Supplementary Note 2 and Extended Data Table 2. DOY, day of year; 
RMSE, root mean squared error. 
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‘Table 1 | Effect of SPRUCE warming treatments on spring and autumn phenological events (phenophases) 


eaves growing 5 ais2089 a eaarigs 7 ~~—aNgs125—~=S SaGr 136 
‘Shoots slongating 4 -as3s0s2 2 s14sis9 «5 3eas1on 4 7241.86 
Flowering (cones open) 6 25120895 eovs1s9 7) -ga1s133 5 674.234 
Flowers terminated 6 145117, 455 
Fruiting 1-256 6 20921383 6064152 
‘autumn buds 2 0592103 

‘Autumn coloration (senescence) 6 2705145 2 4731288 


with enhanced annual photosynthetic uptake (though not necessarily 
increased vegetation growth). This result is consistent with the analysis 
ofllong-term data from FLUXNET sites (http:/fuxnet fluxdata org), 
Supplementary Note Extended Data Fig 3),as wellas previous experi- 
‘mental and observational studies. However, this does not necessarily 
indicate an increase in netcarbon uptake or carbon sequestration under 
future warming, because the long-term carbon balance ofthis peatland 
forest ecosystem is probably dependent on the stability ofthe under- 
lying peat deposits’ 

‘Camera-based results are generally consistent with direct observa 
tion of spring 2016 and 2017) and autumn (2017 only) phenological 
transitions for plant species spanning a range ofleaf habits and growth 
forms (Table 1; se also Supplementary Note 5, Extended Data Tables 4, 
5). Spring phenophases advanced by just over three days per °C warm- 
ing, providing strong support for H Autumn phenophases related to 
leaf coloration or senescence were delayed by almost three days per 1°C 
‘warming again providing support for HI. Relatively litle variation was 
observed in dates of autumn bud set for Chamaedaphne and Picea, pro- 
viding support for H2 for this particular phenophase of these species. 
Although t* breakpoints that improved model fit were commonly iden- 
tified, we note that in most cases the small-sample-corrected Akaike’s 
information criterion (AVAICc; see Methods) was greater than 2er0, 
‘hich means that the simple, linear temperature model was better 
supported by the data, Furthermore, the identified breakpoint temper: 
atures were generally very high—below 4.5°C in only afew instances — 
indicating that future warming would have to greatly exceed RCP45 
projections before photoperiod constraints begin to limit phenological 
shift. The ground observations therefore robustly support H] over 
1H2 or H3, and are consistent with the future extension of the active 
season at both ends. 

‘There is abundant evidence in the literature that photoperiod has a 
role in triggering phenological events”, In many species, there has 
been a local adaptation of phenology to both photoperiod and tem 
perature cues". In some species and environments, photoperiod sets 
athard limit on the phenological response to rising temperatures". 
But, with warming of up to +9°C above current levels, we found litle 
evidence for this in most ofthe species and phenophases that we stud. 
ied. Thus, photoperiod requirements ae stil being met even during 
the shortened winter simulated by the warmest enclosures. In the few 
cases in which there was evidence of a photoperiod effect, it was gen- 
erally only a factor at temperatures well above current temperatures, 
again indicating that substantial future warming would be required for 
photoperiod to become limiting. These findings are consistent with a 
recent analysis showing that for high-latitude species, spring leaf-out 
was generally not sensitive to photoperiod! 

‘The purported role of photoperiod as a phenological constraints to 
prevent plants from responding to temperature signals at the ‘wrong? 
time ofthe year*, However, if photoperiod is not a strong constraint 
‘on spring phenological development, then a counterintuitive predic- 
tion is that continued warming coupled with increasing frequency of 
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climate extremes may increase the likelihood of spring frost damage”. 


‘AtSPRUCE, atypical weather in March (unusually warm) and April 
{extreme cold) 2016 showed that in addition to triggering visually 
apparent phenological shifts, the warming treatments also advanced 
tissue de-hardening and thereby heightened the potential for spring 
frost damage (Supplementary Note 6, Extended Data Fig. 4). Following 
spring frost event in which ambient temperatures dropped to —15°C, 
‘we observed extensive foliar damage in the +9.0°C enclosures (in 
‘which temperatures dropped to about —4°C) and moderate damage 
inthe +6.75°Cenclosures. Minimal damage occurred inthe enclosures 
that received less warming and thus experienced colder minimum 
temperatures. This suggests that the transition from frost-hardy to 
frost-vulnerable is cued by warm temperatures? and isnot constrained 
by photoperiod. Without photoperiod as a safety check on the 
de-hardening process, frost damage may be more severe and/or more 
frequent under future climate conditions. Woody plants generally have 
sufficient nonstructural carbon reservesto recover from occasional frost 
damage", but repeated damage could impair the competitive ability 
of susceptible species?=* (Extended Data Table 6) 

Results from the first two-and-a-half years of the SPRUCE experi 
‘ment, conducted in a winter-dormant ecosystem, show decisively that 
‘warming treatments directly influence vegetation phenology at both the 
start and end of the annual period of vegetation activity. These pheno: 
logical shifts will almost certainly inluence photosynthesis and tran- 
spiration™, aswell as feedbacks to the climate system through effects 
om the surface energy budget", Future extension of the active season in 
‘most eases appears unlikely to be strongly constrained by photoperiod 
in this boreal ecosystem. Potentially inopportune responses to envi 
‘ronmental signals may occur as the climate moves beyond the range 
‘of historical variability, as demonstrated by the spring frost damage in 
the warmest enclosures. Thus, temperature-tracking strategies evolved 
to guide phenological responses to historical year-to-year variation in 
‘weather may be increasingly mismatched to future conditions. 


Online content 

‘Any Methods, incuding any statements of data avlablty and Nature Research 
‘reporting summaries, along with any additional references and Source Daa fle, 
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METHODS 
Statistical methods were not used to predetermine sample size forthe regression 
design. The warming treatments were randomized among 10 pots with simi 
‘egetation and uniform peat depths. Investigators were not blinded to allocation 
during experiments and oatcome assesment. 

Study ste and experimental design. The SPRUCE experiment is located within 
the St peatbogat the Marcell Experimental Forest (47° 30.171'N, 98° 28970 W)", 
‘pprosimately 40 km north of Grand Rapids in north-central Minnesota. The 
Fistrical climate tthe sites sub-humid continental: mean annual temperature 
4°C, mean annual precipitation s750 mm, and extreme temperatures range fom 
~38°C to +30°C. Because this ecosystem islocatedat the southern edge ofthe 
boreal zon, tis considered particulasy vulnerable to climate change. 

“The SI bog is an ombotraphic peatland with a perched water table. Trees 
re approximately 5-8 m in height. Canopy vegetation is dominated by the tee 
species mariana (Mil) BS.P (black spruce), with additional contributions fom 
lriina (Du Rei) K. Koch eastern tamarack or larch). P mariana and lricina 
both have a vast geographic range across North America, from Alaska est to 
(Quebec and Labrador. and southto the Great Lakes and New England. A number 
of closely related Picea and Larix species are distributed across the boreal zone 
ofnorthern Europe, Scandinavia and much of Rusia and Siberia, indicating the 
‘relevance of results of this experiment to our understanding of boreal ecosystem 
processes globally 

“The SPRUCE understory is dominated by the evergreen shrubs 
2 groenlandicwn (Oeder) Kron and Judd (Labrador tea) and Ccalyelata(.) 
‘Moench. (leatherleaf) and is underlain by a bryophyre layer dominated by 
Sphagnun: spp. moss, Other common plan species include the evergreen shrub 
‘Kalmia plot Wangenh. (bog laurel), the deciduous shrub Vacinunn angut- 
{folium Aiton 1789 not Benth. 1840 (lowbush blucberr), the sedge Eriphorwn 
‘spp. (cottongrass) and the perennial herb Maianthemum srflius (.) Sloboda 
(alse Solomon's sal 

‘At SPRUCE, experimental temperature (+0°C ‘unheated control to 420°C, 
225°C increments for both air and dep sil) and CO. (ambient and elevated, 
approximately 400 and 900 ppm. respectively treatments are being applied 
through the use of large (approximately 12-m wide, ma high) open-topped octag- 
‘onal enclosures” Overall five temperature treatment ae paired with twa CO, 
‘ueatment, yielding a total often enclosures additionally, there ae two ‘ambient 
‘environment plots without constructed enclosures) Each enclosures hydeologi- 
‘ally isolated trom the rest af the bog by a sheet ple corral which has been driven 
3-4 mthrough the peat into the underiying ancient lake sediments. Outflow pipes 
‘low fr lateral druinage irom each enclosure. Within each enclosure, warming 
ofthe deep soil began in June 2014, while aboveground warming was initiated in 
“August 2015 and a this time the phenological observations were commenced in 
‘each individual plot (note that pre-treatment observations were made in women 
ea, outside ofthe enclosures, beginning in 2010). CO, treatments were switched 
fon in Jane 2016 

For context, the warmest enclosures (+8.0°C) simulate curtent climate condi- 
‘ons of Wichita, Kansas (mean annual temperature 13°C, mean annual precip- 
tation 850 mm), located appraximatey 1,100 km (10° of latitude) tothe south 
‘The SPRUCE experiment, with treatments that wll exceed the historical range of 
climatic variability (Extended Data Fig. 1), isitentionally planned to push the 
system past projected warming levels to approach or include tipping points for 
‘ny numberof ecosystem response variables, The regesson-based experimental 
‘design facilitates the estimation of temperature response functions, which may 
bbenonlinest” 

‘The enclosure desig, and detailed performance metrics forthe above-and 
belowground warming along witha dsctssion of potential artficts, ae more fully 
described and assessed ina previous publication”. Observed temperature dilfren- 
tile were consistent wit the nominal warming treatments for target enclosures 
‘Warming was homogeneous within individual enclosures and was sustained over 
time (see Supplementary Note I, Extended DataTable 1) 

Phenological observations. We are using two methods to track the phenological 
responses of vegetation to warming and elevated CO in each enclosure. Fist, 
beginning in August 2015, weinstalled digital cameras phenocams™ in each 
enclosure to track seasonal variation in vegetation geenness2 proxy for vegetation 
phenology and associated physiological activity". Second, begining in April 
2016, human observers have been ditecty tracking phenological evens ofboth 
‘woody and herbaccous species 

PhenoCam imagery. Digital cameras (NetCam model SD130BN, StarDat 
‘Teshnologies) were configured and installed following standard protocols ofthe 
PhenoCam network". Cameras record sequential visible light (red, geen, blo; 
RGB) and visible + infrared images every 30 min from 4.00 to 2240, every day 
ofthe year Minimally compressed JPEG images accompanied by metadata fle 
‘containing the current status of all camera settings and diagnosis, are uploaded 
‘ia fle transfer protocol to the PhenoCam server for archiving and processing: 


‘local copy is also maintained on a server running at SPRUCE, The flename of 
{very image identifies the enclosure in which the picture was recorded, swell ae 
‘ate and time stamp in local standard ime. 

“The aluminium structural members of each enclosure provided convenient 
and consistent mounting points for the camera. All cameras were mounted, at 
‘height of 6m in the middle of the thi horizontal structural memaberon the 
‘South wall ofeach enclosure, Cameras were enclosed in lightweight, compact 
‘weatherproof enclosures (model ENC- OUTDS, StarDot Technologies) Network 
‘connectivity and DC power were delivered to each camera using single Ethernet 
‘able and standard power-over Ethernet technology. To reduc the likelihood of 
lighting damage an Ethernet surge protector (ProtetNet model PNETIGB, APC 
bbySchneider Electric) wasinstalled on the camera end of each Ethernet cable, and 
grounded tthe mounting point. 

‘All imagery is posted in neae-real ime to the PhenoCam project web page 
(tp phenocam sean eu), whet its publicly available. Images are processed 
nightly sing standard PhenoCam routines, In brie, this consists of several 
steps First, we defined thee separate regions of interest (ROIs) foreach camera 
field of view, demarcating (1) Pea tees (2) Lari res; and (3) the mixed shrub 
layer. The ROI definitions are converted to binary mask, so that image analysis 
can be completed separately far each vegetation type. Next, images were readin 
sequentially and for each vegetation type the mean pie vale for each ofthe three 
colour channels (red, green and be; forthe purposes ofthe present analysis we 
‘sed only the visible-wavelength imagery was calculated across the coresponding 
[ROL yielding digital number (DN) triplet (Ryn, Bo)-Then for each ROT in 
cach image, we calculated the green chromatic coordinate Gc, which has prev- 
‘ously been shown to bea reliale metic for characterizing the seasonal trajectory 
‘of vegetation colour and activity" 


‘Basic quality control included clisnatng images that were recorded when the 
sun was es than above the horizon, images that were too dark or images that 
‘were too bright. Additionally, because sow might obscure the vegetation ofintr- 
cst foreach day from late August 2015 through the end of December 2017, we 
visually inspected the mid-day image fom each camera, We lagged images in 
‘which there was (1) snow on the ground: (2) snow on tees. We excluded from 
further processing ll days on which the carers view ofthe vegetation of interest 
‘was potentially contaminate by snow: For the shrub layer, this meant eliminating 
mages rom days with snow on the ground: for Piet and Larix this meant elimi- 
‘ating images from days with snow on toes, The frequency of nove decreased with 
Increasing plot temperature from over 100 days per year wth now the ground 
{nthe unheated enclosures irom late October to early May), to less than 30 days 
per year inthe-+9.0°C enclosures (from lte November to cay February). The 
longest period af continuous snow caver was almost three months the unheated 
enclosures, compared with only two weeks in the +0°C enclosures. 

‘Next, we determined 3-day Ger values using the 9th quantile method’. We 
then used a spine-based method to sequentially remove outliers in thee iter 
ative steps. Finaly, we refit the spline, and used the summertime maxima and 
<dormant-season minima define the seasonal Gec amplitude, from which we 
‘vere then ale to identity dates at which 10%, 254 and 50% of the seasonal ampli 
tude were reached in autuma (senescent or green-down phase) and sping onset 
‘or green-up phase) Uncertainties on these dates were then derived based on the 
‘uncertainty around the smoothing spline, Our analysis here focuses on the 25% 
smplitde threshold dates, 

‘Ground observations. Ground observations of spring phenology were made at 
approximately weekly intervals by WRLN. and [MLL in 2016, and by BH. in 
2017. The protocol used by WARN. and R.R.H. involved recording, on apre- 
printed for fr each ofthe 10 encloses and the two ambien environment plots, 
‘whether or not (yes ono) specific vegetative and reproductive phenophases were 
‘observed each week. Observations were conducted ona election of woody species 
(thetrees Picea and Lari the evergreen shrubs leatherea, bog laurel, Labrador tea 
and lowbush blueberry, aswell asa sedge (cottongrass) anda perennial herb (alse 
Solomons seal) We transcribed the data by taking asthe observed dat the first 
survey date on which an event was definitively observed (thats, ne through week 
4, followed by yes in week 5th event occurred in week 5). Notall phenophases 
‘were observed ior ll species, and in some dficult-10-observe cass the data were 
deemed not reliable Because of some inconsistencies in the recorded data (for 
‘xample blank cells rather than no, ‘no followed by yes followed by ‘no’ again) 
‘or poor representation ofthe species in question in some of the plots (fr example, 

bog laurel and lowbush Blucberry are sparsly distributed). All transcribed data of 
questionable reliablsy were excuded from the analysis, 

[LM.L's protocol involved recording the frst date at which Larix leaf buds 
‘were observed tobe jst beginning to break (data recorded forall ten enclosures, 
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plus the two ambient environment plots), and the fist date on which lowers of 
leathesleaf, bog laurel and Labrador tea were observed in each enclosure (data 
recorded in only half of the reted enclosures, plue one or both of the ambient 
environment plots). Although data recorded by LMI. are not as complete as 
those recorded by WR.N, they ae included to demonstrate therobustnes of the 
observed patterns. 
On-site meteorological data. Air temperature and relative humidity were 
measured (model HMP-155, Vaisala) at four points above the peat surface 
within each enclosure (0.5, 1,2 and 4 m),and 30-min mean values teconded. We 
‘sed the measured ar temperature a2 min our analyses. SPRUCE environmen. 
tal data" ae available through the Vista Data Vision portal (htp/sprucedata 
ocalgo). 
Historical perspective and future climate projections. To put the weather dur 
ing winter and spring of 2016 in historical context (122 year record), we used 
data from the National Climatic Data Center ofthe NOAA. Specifically, we used 
summary data from the State of the Climate report (tps/wwanedcnoaa gow! 
‘sotcnational), and three-month divisional temperature rankings (htps://www 
roe nosa govitemp-and-precip/cimatological- rankings). The SPRUCE sit fall 
within Minnesota climate division 2 

“Toplace our results inthe context of projected warming tends over the coming 
century weused downscaed (1/8) climate projection roma selection often models 
(cee Supplementary Note 2) contributing to the CMIPS multimodel ensemble 
dataset=", We used output fortwo RCP scenarios: RCPA.S (CO, stabilization) 
and RCPAS (high CO; emission) *!* To quantify future trends, we calculated the 
projected decadal mean ir temperature change relative tothe 2006-2015 mean 
foreach model, 
‘Statistical analysis To characterize the relationship between ar temperature and 
phenological timing (HI and H2), we used ordinary linear regression, with the 
served phenological date asthe dependent variable and the measured air tem 
perature differential fr each plot (see Supplementary Note 1) as the independent 
Variable, x. The regression slope thus ives the temperature sensitivity in days 
per 1°C warming for the linear temperature model To account for potenti fects 
Bf elevated CO; on phenology, wealso analysed data (where appropriate) using a 
"inca temperature and CO; model! which included temperature, CO; (elevated 
and ambient) anda temperature x CO, interaction effect, Altests were two-sided. 
ta igniticance evel of 0.0. 

Forbreakpoint analysis (H3), we fita three-parameter (a, Sand f)heeakpoint 
temperature model which was specified as: 


y= at ite for ayer 


and 


Y= et e+e forse 
In which x and y areas fr the ordinary linear regression, , isthe regression 
residual and 1s the temperature breakpoint as illustrated in Fig. 1. We con- 
Strained ¢* to fll in the range of -9°C. An edge-hiting vale off —-9°C: was 
obtained when the linear model fi the data every bit a well asthe breakpoint 
model 

‘We used AIC ta identity whether the linear model or the bresipoint model 
washes supported by the avalable data. AIC typically calculated a: 


alc 


nloga 


2» 
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In which isthe numberof observations, pis the number of ft parameters plus 
‘one, and othe residual sum of squares divided by». When is small relative to 
the mall-sample-correted criterion, AIC: is prelerred 


2(9+0) 


AIG. = AIC + 


Pa 


ACetectvely balances improving explanatory power (ower) against incresiog 
compen (age) and hus AIC selects against over parameterized models 
“The model wit the lowest AIC is considered the best mods! given the dats, and 
theabsoiediferenein AIC, scores betyeen two model can bused crate 
the weight of evidence in support ofthe better model be ditference (AATC) 
issmall or zero then the two model ae equally good Bu if ATC 2.0, then 
themodel wth the lower AIC. i almost thre ines morelikey woe the best 

Reporting summary. Further information on experimental design isaalble in 
the Nature Research Reporting Sunsmary inked to this pape. 

Data availablity. PhenoCans imagery s public available through the project 
vb page (hp! phenocam srunh.ed), andthe phenological datasets sed in 
this stadyare avalable through the SPRUCE data portal! 
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Extended Data Fig. 1 | Air temperature and precipitation in the 
SPRUCE SI bog (August 2015 to December 2017) relative to long-term. 
(1960-2000) means and variability. Long-term daily mean temperature 
(°C, 1 sd. indicated by shading), compared with daily mean temperature 
(calculated from 30-min means, based on n=? sensors mounted at 2-m 
height in each enclosure) in +0°C enclosure (unheated control) and 

1 +9.0°C enclosure. b, Long-term monthly mean temperature (mean 


daily maximum and mean daily minimum indicated by shaded bars), 
‘compared with monthly mean temperature (calculated from daily means, 
asin a) in different experimental treatments , Long-term monthly 
mean precipitation (mmm, +1 sd. indicated by shading, with maxima and 
‘minima indicated by dotted lines), compared with measured monthly 
precipitation (n= I rain gauge) inthe 1 bog. 
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Extended Data Fig. 2 | Decadal mean temperature change (relative to 2006-2015 mean) projections from ten CMIPS earth system models fr the 
SPRUCE site a, Stabilization climate scenario (RCP4.5). b, High emission climate scenario (RCPS.3). 
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sites) and within-stes patterns in spring (e) and 


start and end ofthe photosynthetic uptake period, as derived from autumn (d) in relation to seasonal temperature anomalies (n= 86 


FLUXNET data for evergreen conifer-dominated stes.a-d, Across ste-year). 
site patterns in spring (a) and autumn (b) in relation to mean annual 
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Extended Data Fig. 4 | Unusually warm weather in late winter, followed 
by extreme cold in early April, resulted in severe frost damage in the 
‘warmest enclosures at SPRUCE in 2016, a, Time series of daily mean air 
temperature, comparing plot 17 (+9.0°C warming) and plot 19 (unheated 
enclosure), during the winter and spring of 2016. By the time the frost 
event occurred (grey shading), the daily mean temperature in plot 17 

had heen abave freezing for over a month, but had repeatedly dropped 
below freezing in plot 19. b, Time series of 30-min air emperature—agein 
comparing plot 17 and plot 19—leading up to and immediatly following. 
the frast event, which occurred on the morning of9 April and again on 12 


Apri. Te thin red lines indicate the variability (maximum and minimum) 


across =5 temperature sensors in plot 17.c, Time series of daly Gec, the 


{green chromatic coordinate, for Picea tres in plot 17 and plot 19. Arrows 
‘denote spring green-up dates (progressively larger arrows corresponding 
to 10%, 25% and 50% of seasonal amplitude) estimated from Gc. The 
pronounced decline in Gein plot 17 following the frost event (grey 
Shading) is readily spparent. Trees in plot 19 retained sufficient fost 
hardiness that they were undamaged, despite experiencing much colder 
temperatures. d, Brown frost-damaged Larix foliage in plot 17-e, Picea 
branches in plot 17, showing lose of most foliage from previous years, with 
green foliage from the 2015 fish retained only at branch tips. f, Picea 
branches with fost-damaged foliage from previous years, but healthy 
green foliage from the 2016 sh. 
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Extended Data Table 1 | Mean daily air temperature and temperature differentials associ 


(ed with whole-ecosystem warming 
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Extended Data Table 2 | Effect of SPRUCE warming treatments on spring green-up and autumn green-down. 
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Extended Data Table 3 | Projected future extension of th 


riod of vegetation activity 
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ive and reproductive phenological transitions (2016) 


Extended Data Table 4 | Effect of SPRUCE warming treatments on observed vege! 
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Extended Data Table 5 | Effect of SPRUCE warming treatments on observed vegetative and reproductive phenological transitions (2017) 
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Extended Data Table 6 | Impact of premature foliar senescence on nutrient content of L laricina and P. mariana litter 


Spring posto wey a7 (o7t2aB) On _OAB MHS) 
romaturesenescent iter C18) 46261463) «463 (447.500) 
oT wis) 03 (30080) oa? Oaeas0) 
Nomeiserexentiner _c1X) Sit GO9S1A) SLB LARS) 
Senescentaerat——— NevNow SE 12 
Premature norma! Caen 080 os 


riser atten ana nope canst 


LETTER 


Accumulation of 8,9-unsaturated sterols drives 
oligodendrocyte formation and remyelination 


Zita Hubler", Dharmaraja Allimuthu", Ilya Bederman’, Matthew S. Blitt', Mayur Madhavan’, Kevin C. Allan', 


H. Elizabeth Shick, E 


¢ Garrison’, Molly T. Karl", Daniel C. Factor!, Zachary S. Nevin', Joel L, Sax', Matthew A. Thompson’, 


Yuriy Fedorov’, Jing lin’, William K. Wilson’, Martin Giera®, Franz Bracher’, Robert H. Miller’, Paul |. Tesar' & Drew 1. Adams'* 


Regeneration of myelin is mediated by oligodendrocyte progenitor 
cells—an abundant stem cell population in the central nervous 
system (CNS) and the principal source of new myelinating 
oligodendrocytes. Loss of myelin-producing oligodendrocytes in 
the CNS underlies a number of neurological diseases, including 
‘multiple sclerosis and diverse genetic diseases!*, High-throughput 
chemical screening approaches have been used to identify small 
molecules that stimulate the formation of oligodendrocytes 
from oligodendrocyte progenitor cells and functionally enhance 
remyelination in vivo", Here we show that a wide range of these 
pro-myelinating small molecules function not through their 
canonical targets but by directly inhibiting CYPS1, TM7SF2, or 
EBP, a narrow range of enzymes within the cholesterol biosynthesis 
pathway. Subsequent accumulation of the 8,9-unsaturated sterol 
substrates of these enzymes isa key mechanistic node that promotes 
oligodendrocyte formation, as 8,9-unsaturated sterols are effective 
when supplied to oligodendrocyte progenitor cells in purified form 
whereas analogous sterols that lack this structural feature have 
no effect, Collectively, our results define a unifying sterol-based 
‘mechanism of action for most known small-molecule enhancers of 
oligodendrocyte formation and highlight specific targets to propel 
the development of optimal remyelinating therapeutics. 

Imidazole antifungal drugs area structurally diverse class of small 
‘molecules that robustly stimulate the generation of new mouse and 
hhuman oligodendrocytes and enhance remyelination in mouse models 
of disease’. Imidazole antifungals mediate their effects in yeast by 
inhibiting CYP51, an essential enzyme for sterol biosynthesis in both 
fungal and mammalian cells (for a detailed cholesterol biosynthesis 
diagram, see Extended Data Fig. 1). Across a panel of nine azole- 
containing molecules, the ability to inhibit CYPS1 in vitro and in oigo- 
dendrocyte progenitor cells (OPCs) predicted enhanced formation of 
myelin basic protein-positive (MBP*) oligodendrocytes from mouse 
epiblast stem cell-derived OPCs (Fig. la-d, Extended Data Fig. 24-c). 
‘Tomeasute inhibition of CYP51 in OPCs, we used gas chromatography 
and mass spectrometry (GC-MS) to quantify the increase in levels of| 
Janosterol (the substrate of CYPS1) and decrease in cholesterol!" 
(Fig. 1b, Extended Data Fig. 2c-e). In cells treated with ketoconazole, 
the dose-response curve for accumulation of lanosterol closely resem: 
bled the dose-response curve for enhanced oligodendrocyte formation 
(ig. le, Extended Data Fig. 2b fg). Notably, we confirmed al effects of 
small molecules on oligodendrocyte formation and sterol levels using 
a second, independently isolated batch of OPCs, and key results were 
also validated using mouse primary OPCs (Extended Data Fig. bi 
see Methods). In addition, the effects of azole molecules were con 
firmed using an orthogonal image quantification approach, a second 
oligodendrocyte marker, and liquid chromatography’ with mass spec- 
trometry (LC-MS) to detect cellular sterols (Extended Data Fig. 2)-). 


We next sed RNA interference and metabolite supplementation to 
independently confirm the role of CYPS1 in oligodendrocyte forma 
tion, Cell-permeable small interfering RNA (siRNA) reagents depleted 
CYPS1 transcript levels in OPCs by 80%, ed to substantial accummula- 
tion oflanosterol, and enhanced formation of MBP* oligodendrocytes 
(Fig. lef Extended Data Fig. 2m-o). In addition, we treated OPCS 
directly with purified lanosterol and observed enhanced formation 
‘of MBP” oligodendrocytes ina dose-responsive fashion (Fig. 1g, h, 
Extended Data Fig 2p, g). These findings support the idea that CYPS1 
{s the functional target of imidazole antifungals in OPCs and suggest 
that accumulation af sterol intermediates may play a dizect role in 
enhancing oligodendrocyte formation, 

As inhibition of CYPS1 was sufficient to induce the formation of 
ligodendeocytes, we used a chemical genetics approach to test whether 
‘modulation of other steps in cholesterol biosynthesis had a similar 
effect (Fig. 2a, Extended Data Fig, 1). We used GC-MS-based sterol 
profiling in OPCs to confirm that a panel of eight small molecules 
selectively inhibited their known enzyme targets within the choles- 
terol biosynthesis pathway (Extended Data Fig. 3a~di see Source Data 
for abundance of all quantified metabolites in all GC-MS-based sterol 
poling experiments). Only molecules targeting CYPS1 (ketocona 
2ole), TM7SF2 (amorolfine”), and EBP (TASIN-1") enhanced for. 
‘mation of MBP" oligodendrocytes, whereas inhibitors of the five other 
pathway enzymes were ineffective (Fig. 2b, Extended Data Fig. 3e-h), 
‘Treatments had little effect on cell number (Extended Data Fig. 3). 
Concentrations of amorolfine and TASIN.1 that enhanced oligoden 
<rocyte formation also led to accumulation of L4-dehydrozymostenol 
and 2ymostenol, respectively (Extended Data Fig. 3). Moreover, 
distinct structural classes of inhibitors of CYPS1, TM7SF2 and EBP 
comparably enhanced oligodendrocyte formation, including at 
picomolar doses!” (Extended Data Fig. 4a-h). 

We also used CRISPR-Cas9 targeting to evaluate the effects of 
genetic suppression of EBP. OPCs expressing Cas9 and guide RNA 
targeting EBP demonstrated reduced EBP transcript levels, bust 
accumulation ofthe expected intermediate 2ymostenol, and enhanced 
formation of oligodendrocytes under difereniation-permissive condi- 
tions (Fig. 2c, 4, Extended Data Fig. 4k). Two independent guide RNA 
sequences produced similar results (Extended Data Fig. 4-1) In total 
this genetic and chemical genetic analysis suggests that inhibition ofthe 
cholesterol biosynthesis pathway withina limited window of enaymes 
between CYPS1 and EBP is sufficient to enhance the formation 
of oligodendrocytes, 

The efficacy ofthese small molecules and genetic perturbations 
was not mediated by a simple reduction in sterol levels, as treatment 
with statin drugs or methyl 3-cyclodextrin depleted cholesterol levels 
comparably without enhancing oligodendrocyte formation (Fig, 2b, 
Extended Data Figs 3b, 5a, b). Because teatment of OPCs with the 
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Fig. 1| imidazoles inhibit CYPS1 to enhance oligodendrocyte 
formation. a, Rat CYPS1 enzymatic activity following treatment 

with azoles.n =2 independent enzymatic assays, b, GC-MS-based 
‘quantification of lanostrol levels in OPCs treated with the indicated 
Booles at 2.5p.M. n—2 wells per condition. , Percentage of MBP. 
oligodendrocytes generated from OPCs following treatment with 

szoes (), cell-permeable siRNA reagents (0), or lanosteral (gn > 4 wells 
pet condition; for exact well counts inal figures, see Methods section, 
‘Statistics and reproducibility In f,* P= 0.0008, two-tailed Student's 
‘test d Representative images of OPCs teated withthe indicated 

szoles. Nuclei are labelled with DAPI (blue) and oligodendrocytes are 

Indicated by immunostaining for MBP (green). Scale bar, 100m 

«¢, GC-MS-based quantification of lanosterol levels in OPCs treated with 
the indicated reagents. n—2 wells per condition. h, Structure f lanostero 
Allbar graphs indicate mean +s Results in c,d, g are representative 

of three independent experiments those in b fare representative 

tf two independent experiments using OPC: cells for validation in 

an independent derivation of OPCs, see Extended Data Fig. 2. Keto 
ketoconazole. 


CYPS1 substrate lanosterol enhanced oligodendrocyte formatio 
wwe examined the effects of other purified sterols Treatment of OPC: 
With 8,9-unsaturated sterols, including L4-dehydrozymostenol (which 
accumulates following TM7SF2 inhibition) and zymostenol (which 
accumulates following EBP inhibition), enhanced the formation of 
MBP* oligodendrocytes. By contrast, sterols lacking 8,9-unsaturation, 

cluding cholesterol itself, were ineffective (Fig 2, h, Extended Data 
Fig. 50). A total of nine natural and unnatural 8,9-unsaturated sterols 
enhanced oligodendrocyte formation from OPCs, with 2,2-dimethyl- 
zymosterol the most potent among those evaluated to date (Fig. 2, 
Extended Data Fig, 5d-I, 0). Conversely, treating OPCs with Ro 
48-8071, which inhibits lanosterol synthase and thereby prevents 
the accumulation of 89-unsaturated sterols, abrogated the enhanced 
oligodendrocyte formation induced by the CYPS1 inhibitor ketocona- 
ole (Extended Data Fig, 5m, n,p).n addition, analogues of either 
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zymostenol or 8-dehydrocholesterol that lacked 8,9-unsaturation were 
inactive, demonstrating that 8,9-unsaturation isa crucial structural 
feature for activity in OPCs (Fig. 2g, Extended Data Fig. 5k 1). Finally, 
co-treating OPCs with ketoconazole and MAS--412 provided no further 
benefit ver ketoconazole alone, confirming that these molecules act 
through a redundant mechanism (Extended Data Fig. 5q,). Together 
these findings indicat that the accumulation of 8,9-unsaturated sterols 
in OPCs sa central mechanism for enhancing oligodendrocyte forma 

tion, whether these sterols arise from small-molecule inhibition ofcho- 

lesterol biosynthesis enzymes or ae supplied to OPCs in purified form. 

‘Most ofthe 8,9-unsaturated sterols tat are shown here to enhance 
oligodendrocyte formation have previously been shown to function as 
signalling lipids in oocytes by inducing the resumption of meiosis 
While the direct cellular targets of 89-unsaturated‘meiosis-activating 
sterols remain poorly understood, there is evidence nuclear hormone 
receptors (NHR) may play a role". We evaluated 2,2-dimethylzymos 
terol and the pathway inhibitors Ketoconazole and TASIN-1 in cell- 
based reporter assay’ for 20 NHRs, but no molecule showed significant 
activity in any assay (Extended Data Fig. 5s-u). Additional experiments 
discounted a role for SREBP2, which transcriptionally regulates 
cholesterol homeostasis, suggesting that these sterols act by mecha 
nisms beyond NHRs or SREBP2 (Extended Data Fig, 5). Together, 
these studies suggest a novel ole for the meiosis-actvating sterols in 
promoting oligodendrocyte formation 

In parallel, we executed a screen of aver 3,000 bioactive small mol 
ecules and approved drugs aa uniform dose of 2).M (Extended Data 
Fig. 6a). In addition to molecules previously annotated as enhancing 
OPC differentiation’, we also identified many confirmed hits 
whose known targets did not cluster into easily discernible categories 
(Supplementary Table 1). Among the top ten novel enhancers of oligo 
dendrocyte formation, four molecules had previously been shown to 
inhibit TM7SF2 or EBP in CNS-derived cells", Infact, GC-MS-based 
sterol profiling revealed that al ten top hits led to accumulation of 
8.9-unsaturated sterols atthe screening dose whereas randomly selected 
library members had no effect on sterol levels or oligodendrocyte 
formation. (Fig. 3a, Extended Data Fig. 6-f) 

Given the frequency of cholesterol pathway modulators within our top 
screening hits, we assessed whether any previously reported enhancers 
of remyelination identified by drug screening might also induce accu 
‘mulation of sterol intermediates. At concentrations that promoted 
oligodendrocyte formation, benztropine, clemastine, tamoxifen, 
and U50488 induced accumulation of zymostenol and zymosterol 
and decreased basal sterol levels, indicative that they inhibited EBP in 
(OPCs (Fig. 3b, Extended Data Fig. 6g-1). Tamoxifen has been shown 
to inhibit the enzymatic activity of EBP directly'"™=", and we con 
firmed that benztropine, clemastine, tamoxifen, U50488, and several 
high-throughput screening (HTS) hits all inhibited EBP directly in a 
biochemical assay" (Fig. 3). By contrast, liothyronine and bexarotene 
showed minimal effects on sterol levels in OPCs (Fig. 3b, Extended 
Data Fig. 6g), consistent with their known functions as modulators of| 
transcription factor function and confirming that many, but not all, 
treatments that enhance oligodendrocyte formation cause accumula. 
tion of 89-unsaturated sterols. 

‘While each of these bioactive small molecules has a previously anno. 
tated ‘canonical target, extensive structure-actvity relationship data 
show that the ability to inhibit EBP, rather than the canonical target, 
predicts enhanced oligodendrocyte formation. For example, we vai 
dated a panel of six muscarinic receptor antagonists that all showed 
‘neat-complete inhibition of the M1, M3, and M5 muscarinic receptor 
subtypes at the HTS dose of uM (Extended Data Fig. 6m, p). Among 
these molecules, only clemastine and benztropine inhibited EBP in 
(OPCs, and only clemastine and benztropine enhanced oligodendrocyte 
formation (Extended Data Fig. 6j k, m~F). Likewise, among selective 
oestrogen receptor modulators (SERMs),toremifene and ospemifene 
are structurally near-identical and show comparable cellular 
anti-oestrogen activity. However, only toremifene inhibited EBP in 
(OPCs, and only toremifene enhanced oligodendrocyte formation 
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Fig. 2| Small-molecule inhibition of CYPS1, TM7SE2, or EBP enhances 
oligodendrocyte formation via accumulation of §9-unsaturated sterols, 
a, Abbreviated cholesterol biosynthesis pathway. For greater detail, see 
Extended Data Fig. 1 FF-MAS, follicular fuid-meioss-activating sterol, 

b, Percentage of MBP" oligodendrocytes generated from OPCs treated with 
the indicated pathway inhibitors. > 4 well per condition, Percentage of 
MIP" oligodendrocytes generated from OPCs expressing Cas¥ and guide 
[RNA targeting BBP. n > 3 wells per condition d, Functional validation 


(Extended Data Fig. 74-g). Conversely, while 4-hydroxy-tamoxifen, 
as expected, showed 100-fold enhanced celular anti-oestrogen activ- 
ity relative to tamoxifen, both molecules have comparable potency for 
inhibition of EBP and comparable potency for enhancing oligodendro- 
cyte formation (Extended Data Fig. 7h-). Finally the leading novel hit 
from our HTS, EPZ005687, was annotated as an inhibitor ofthe histone 
‘methyltransferase EZH2. However, analysis of three additional structur 
ally elated EZH2 inhibitors revealed that only EPZ005687 inhibited EBP 
and enhanced oligodendrocyte formation (Extended Data Fig 7k-r). 
Across various antimuscarinic agents, SERMs, and EZH2 inhibitors, 
the ability to inhibit EBP rather than each molecules canonical activity, 
predicted enhanced oligodendrocyte formation. 

‘We next tested the potential for combinations of small molecules to 
show additive or non-additive effects, Combining the thyroid hormone 
agonist liothyronine with a range of treatments that both modulated 
sterols and induced differentiation of OPCs produced additive effects 
6 oligodendrocyte formation, indicating that these molecules are 
likely to function by mechanisms other than thyroid hormone receptor 
signalling to enhance oligodendrocyte generation (Extended Data 
Fig. 8a,b). By contrast, combinations of ketoconazole at a maximally 
effective dose with benztropine,clemastine, tamoxifen, or USO488 did 
not enhance differentiation above levels seen for ketoconazole alone 
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‘of Cas9-based targeting of EBP using GC-MS-based quantification of 
‘zymostenol levels. n—2 vellsper condition. e-g, Percentage of MBP 
‘oligodendrocytes generated from OPCs withthe indicated purified sterols 
n> 4wellsper condition by Structures of various sterols. Allar graphs 
indicate mean +s. See Methods section ‘Statistics and repeoduciility 
for exact well counts. Experiments in b-g are representative of two of 
more independent experiments using OPC-5 cells fr validation in an 
independent derivation of OPCs, see Extended Data Figs 3-5, 


(Extended Data Fig. 8c-e), consistent with these molecules sharing 
§,9-unsaturated sterol accumulation as a common mechanism for 
induction of oligodendrocyte formation. 

Because our in vito OPC assays modelled only the inital differentia- 
tion event into oligodendrocytes, we next tested whether sterol pathway 
‘modulation also enhanced subsequent oligodendrocyte maturation 
and myelination in vite and in vivo. First, we cultured OPCs on elec: 
{rospun microfibres to assess the effects of sterol pathway modulators 
‘on the ability of oligodendrocytes to track and wrap along axon-like 
substrates***, Ketoconazole (CYP51), amorolfine (TM7SF2), and 
‘TASIN-1 (EBP) all robustly enhanced tracking along and wrapping 
around microfibres by MBP” oligodendrocytes. By contrast, inhibi 
tion of other enzymes up- or downstream in the pathway had litle 
effect on oligodendrocyte maturation and ensheathment of microfibres 
(Extended Data Fig 8-0, 

The imidazole antifungal miconazole, which targets CYP51, 
penetrates the mouse blood-brain barrier and enhances remyelination 
in mouse models of demyelination‘. Here we evaluated brain-penetrant 
‘molecules with affinity for TM7SF2 (ienprodil) and EBP (tamoxifen) 
usinga well-established mouse model in which injection of lysoeci 
thin is used to create focal lesions of demyelination in the dorsal col 
umn white matter of the adult spinal cord. In vehicle-treated mice, 
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Fig. | Inhibition of IM7SE2 and EBP ie a unifying mechanism 
for many small-molecule enhancers of oligodendrocyte formation, 
4, Quantification of sterol levels in OPCs treated with the indicated 
rmlecules at 24M (one well per condition; fr validation ina second 
derivation of OPCs, see Extended Data Fig. 6), Quantification of 
sterol levels in OPCs tested with the indicated previously reported 


profiles of sparsely distributed remyelinating axons characterized by 
thin myelin sheaths were detected mainly a the periphery ofthe lesion, 
while ultrastructural analyses revealed unmyelinated axons or axons 
witha single wrap of myelin (Fig. 4a, b). By contrast, after eight days of 
treatment with ifenprodil or tamoaifen, remyelination was widespread 
throughout the lesion (Fig, 4a, b, Extended Data Fig. 9a), consistent 


Fig. 4 | Accumulation of 8 9-unsaturated sterols enbances 
remyelination in vivo and in human brain spheroids. a, Representative 
electron microscopy images of LPC-lesioned dorsal spinal cord fom, 
mice treated with ilenprodl or tamoxifen. Scale bar, Sum. b, Tukey plot 
showing quantification of remyelinated axons in LPC-lesioned spinal 
cord from mice in a.n=6 animals per group except vehicle, 
=*P=0,0004, +P = 0.048, two-tailed Mann-Whitney test. Boxes indicate 
the interquartile range, horizontal lies represent the median, and 
whiskers represent the smaller of 1. times the interquartile range and 
the minimam-maximum range. ¢, Quantification of brain stral levels in 
mice treated with miconazole fenprodil, or tamoxifen. n—4 animals per 

‘up. P=0,0007 for miconazole, P~0,0003 for ifenprodil, P= 0.0006 
for tamoxifen; two-tailed Students t-test. d, Quantifieation of myelin 
regulatory factor (MYRE) oligodendrocytes within human myelinating 
cortical spheroids following treatment with miconazole (2).M) or 
ifenprodil (2M). n=4 spheroids per treatment condition. P= 0.0009 
for miconazale, P= 0.0009 fr ifenpradil two-tailed Stadents -tes 
«Representative images of spheroids, DAPI’ nucle (blue) and MYRE 
oligodendrocytes (red) are labelled, Scale bar, 100 im. Inc, bar graphs 
indicate mean and error bars indicates. 
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with a recent report regarding tamoxifen®, Critically, we used GCMS 
based sterol profiling of brain tissue from mice treated with miconazole, 
ifenprodil, and tamoxifen to demonstrate that these therapeutic dosing 
regimens all ed to substantial accumulation of 8,9-unsaturated sterols 
within the mouse brain, indicating that CYPS1, TM7SF2, and EBP, 
respectively, were inhibited (Fig 4c). Collectively these data show that 
small-molecule inhibitors of CYP51, TM7SF2, and EBP can engage 
their sterol pathway targets and enhance remyelination in mice. 

Finally, the oligodendrocyte-enhancing and sterol-modulating 
activities of leading pathway inhibitors extend to human cells and 
tissue. Various small molecules caused accumulation of the expected 
8,9-unsaturated sterol intermediates both in alhuman glioma cell line 
and in human pluripotent stem cell-derived cortical spheroids”, con. 
firming that these molecule similarly engage the sterol synthesis path 
‘ay in mouse and human cells and CNS tissue (Extended Data Fig. 9, 
Critically, miconazole and ifenprodil also substantially enhanced 
the generation human oligodendrocytes in a 3D human pluripotent 
stem cell-derived cortical spheroid model, indicating conservation of| 
function across species (Fig 4) 

‘We have defined a dominant mechanism shared by many small: 
molecule enhancers of remyelination: elevation of levels of 8,9. 
unsaturated sterol intermediates by inhibition of a narrow range of 
cholesterol biosynthesis enzymes between CYP51 and EBP. We have 
identified 27 small molecules that both enhance oligodendrocyte 
formation and increase levels of 8,9-unsaturated sterol intermedi 
ates"?! Mechanistially, several lines of evidence support a central 
signalling role for 8,9-unsaturated sterols inthe observed enhanced 
oligodendrocyte formation, including the ability of nine independent 
8,9-unsaturated sterols to enhance the formation of oligodendrocytes 
when supplied to OPCs (Extended Data Fig. 10) 

‘Myelin is cholesterol-enriched, and past work has established that 
genetic or pharmacological treatments that inhibit early enzymes in 
cholesterol biosynthesis lead to hypomyelination in vivo ®. Our work 
supports these observations, as inhibition of HMGCoA reductase or 
squalene synthase had neutral-to-negative effects on oligodendro. 
cyte formation in our assays (Fig. 2b, Extended Data Fig. 3). These 
enzymes catalyse steps before the synthesis of the frst sterol interme 
diate, so their inhibition prevents the synthesis ofall cellular sterols. 
ur findings establish an alternate paradigm in which the cholesterol 
biosynthesis pathway can be leveraged to enhance the formation of 
new oligodendrocytes by targeting later steps whose inhibition does 
‘not cause net depletion of cellular sterols. Instead, acute inhibition of 
CYP51, TM7SF2, or EBP during OPC differentiation induces a'sterol 
shift in which a minority of cellular cholesterol is diverted to 8. 
‘unsaturated sterol intermediates that signal to enhance oligodendro 
eyte formation. Notably, we and others have independently shown that 
‘multiple molecules now annotated by usas enhancing 8 9-unsaturated 
sterol intermediate levels can regenerate functional myelin in vivo, 
as evidenced by reversal of paralysis in mice with MS-like disease** 
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‘Ultimately, our work demonstrates that modulating the sterol land. 
scape in OPCs can enhance the formation of oligodendrocytes and 
points to new therapeutic targets, potent inhibitors for these targets, 
and metabolite-based biomarkers to accelerate the development of 
optimal remyelinating therapeutics. 


Online content 
Any Methods, including any statements of data availabilty and Nature Research 
‘reporting summaries, along with any additional references and Source Daa fle, 
ae avallabein the online version ofthe paper at hitps-//doorg/101038/41586- 
n10360-3, 


Received: 10 May 2017; Accepted: 3 May 2018; 
Published online 25 July 2018. 


1 Goldman, S.A, Neergaard, M.& Windrem, M.S. Gli progenitor cel-basee 
treatment and modeling of neurological csease. Science $38, 491-195 (2012) 

2 Fancy SP etal Overcoming remyainatin failure in mute sclerosis and 
ster myatin disorders Exp, Nera. 225, 18-23 (2010). 

3. Franklin. & Fench-Constant C Remyalination inthe CNS: rom blogy to 
‘therapy. Nat ev Newose 8, 838-855 (2008). 

4. HaymF letal Drug ossed modulation of ndagenous tem cells promotes 
funcional remysinaion nv Nature 522, 216-220 (2015). 

5. Deshmuth, Vata A regenerative proach tthe retment cf maliple 
cies Nature 802. 327-332 (2013), 

6. Mex Fetal Meropllar araye ac high throughput eressing platform for 
therapeutics in mulplosleross Nat Med. 20, 954-360 2014). 

7. Mei etal keniicaton ofthe kappa-opiid receptor as therapeutic target 
forlgedenrayteremyalinton.1surass: $6, 7525-7935 (2016) 

8. Huang. d.Ketal Retinoié Xrecepior gamma signaling accelerates CNS 
remjeinaion Nat Newrsci 14 85-53 (2011, 

9, Gonzales Get al Tamonien accelerate the rapair of emyslinate lesions in 
the central nervous system. Se Rep, 6, 31599 (2016), 

10, Lariesailinghar, KD. eal Arigh throughput dug screening assay to 
identity compounds that promote sigadendrecysciffrentstion using 
acutely dssociate and punted olgodencrcyte precursor cols. BMC Res. 
Notes 9,419 2016). 

11, Horade Ze The efectof small molecules on sterl homeostasis: measuring 
7 rdehydrocholstra in DHGR -dftsent Neuroda cell ang human fibroblasts 
4:Med Chem, $9, 1102-1115 (2018), 

12, Giora,M, Moller, C. &Bracher, F Analysis and experimental inhibition of esta 
‘cholestrol biosynthesis. Chromatographia 78, 343-358 (2015). 

13, Glen Mi, Plossl F-& Brachar Fast anceasy vie steering assay for 
tholesterol biosynthesis inhbtrsin the post squalene pathway Steods 72, 
6533-642 2007). 

14, Mir: LeBreton,G.C. A novel nuclear signaling pathy for tromborane AZ 
‘eceptorenalgodendrocytes: evidence or signaling compartmentalzaton 
‘luring diterenfation. Il cl 80, 28, 6229-6341 (2008), 

15, dchak,G Rt al Slicon incorporated merpholine antifungals: design, 
synthesis and bilogeal evaluation, ACS Med. Chom. Lett 6 1111-1116 
@ar5, 

16, Zhang L ot al. Selective targeting of mutant adenomatous palypsis col APC) 
imeclorecta cancer Set Trane Med. 8 361ra140 2016). 

17, DeBrabander, J Shay. W, Wang W. Njhawan D.& Theodoropouios,P 
Targeting emopami bing protein GBP wih small molecules that induce an 
shyormalfedback response by layering endogenous cholesterol biosynthesis. 
{ig patent aplication US 2016/0313302 At (2018), 

1B, Saher Geta Therapy of Palzaeus-Merzhachercseasein mie by feeding a 
sholesterahenvched let. Nat Me. 1B, 1130-1135 (2012), 

19, Byshow AG, Andersen, C..& Leonardzen Role of moiossacsting trol 
INAS. in nce oocyte maturation. Mol Call Endocnnol 187 189-156 (2002). 

20, Grandah C.Oacyte maturation. Basic and clinical aspects ain viro 
‘maturation (Wi) wits special emphasis of theole of F-MAS. Dan Mee. Bul 
55, 1-16 (2008), 

21. Canfran-Duque, ta Atypical antipsychotics ater cholestrol and fatty acid 
metabolism invite. Lpt Res. 84 310-324 (2013) 

22, Mosbius FF etal Pharmacological analysis af sterol dliaS-delia7 isomerase 
proteins with [3Hfenprod. Mol Pharmacol 54, 591-598 (1998), 


2, Gyling H. tal Tamenien and orien aver serum cholestrol by inhibition 
dala @-chlarteral canvrsanf athostr in warn wth breast cancer 
{'cin Oneat13, 2900-2908 (1995), 

24, Becher ME, Byrne, & lrench-Constant, CNS myelin sheath lengths 
aye an intrinsic properly of oigodencrocytes. Cure Biol 25, 2411-2815, 
ois). 

25, Lee Seta A culture system t study cigodencroeyte myelination processes 
Usiig engineered nanofibers Nat Methods 917-322 (2012), 

26. Mi. tal Promotion of ceniral neraus system remyelination by induced 
Stferentation of olgedendrocyte precursor cels Ann. Neva 65, 304-315, 
(2003). 

27. Madhavan Metal. Induction of myelating olgedendrocyts inhuman 
coal spherlds Net Methods Miss://dorg/10.1038/541592-018 0081-4 
(eo18, 

28, Miron WE etal Statn therapy inhibits renyelination inthe central peraus 
"tem. Am. Pathol 174, 1880-1890 (2003), 

28, Kloptisch, 8 et al. Negative impacto! statins on oligodendrocytes 
and malin formation invite and in vivo, Neurosci 28, 3609-13614 
(2008). 

20, Saher G etal High cholesteal evel is essential far myelin membrane growth 
Nat Newese.@ 468-475 (2005) 


“Acknowledgements This work as supported by National Institutes of 
Heath grant NS095280 (RM. PLT), Conrad N. Hilton Foundation Plot 
Innovator in MS Award (0.1), Malinekrodt Foundation Grant Aware 

(O.1A}, Mt Sina Health Care Foundation, phianthape support fem the 
Peterson, Fakhnour, Long, Goodman, Geller Judge, and Weidenthal families, 
and unrestricted suppor from the CWRU Schoo of Medicine. 2H, MSE, 
KCA,Z5N_ and JS. were cupporie by the CWRU Medical Scents 
“Training Program (MH T32 GMOU7250). 2H. vas also supported by NIM TLL 
‘TROOOGA Additional support was proviged bythe Small Molecule Drug 
‘Development, Proteomics, ad Translational Research Shared Resources of 
‘he Case Comprehensive Gancer Canter (P30 CAD43703), We acknowledge 
Use a the Laca SPS confocal microscope in the Light Microscopy Imaging 
Facity a CWRU made avaliable through the Ofice of Research Inirastucture 
(NIH-ORIP) Shared Instrumentation Grant (S10 ODOLG164). We thane 

M. Drumm, Mille Karl. yoha-Bello 1 Pink, P- Conrad R Lee, 

X LUD. Schlatzar K Polak, Janssen Pharmaceutica NV, CXR Biosciences, 
‘ThetmoFisher Avanti Pola Lins, and the P Sache laboratary fr technical 
assistance and clecussion. 


[Author contributions ZH, D.A, MSE, MI.ZSN. KC.A, HES. MAT. ond 
Did evaluated the affects f cmall malecuiss and genetic manipulations 

on oligocendrocyteformatan nie. ZH,OA,(8.MAT,FS,and DIA 
performed and analysed stro rofing expermensin OFCs invite. DCF, 
YF PLT and DLA performed high-throughput screening. ZH, LB, HES, 
GMM, MIGRHM_PJT. and OA evaluated the n vivo eficacy ofall 
‘molecules on remysinaton and sterol levels 242 and JS, profied nuclear 
hormone receptors 2H, MM. and ZN. performed experiments on human 
cortical spheroids, WW, MG, and FB syntheszed and pure stra 
‘agents. ZH, DA, PJTand BJA analysed aldata and wrta the manuscript 
Allauthors provided intellectual input, edited and approved the final 
‘manuscrne 


Competing interests 01.8, PJT,ZH.O.A. MSE and RM. ae inventors 
con patents and patent applications that relate to this wark and have baer 
licensed to Convelo Therapeuties Inc. which seeks to develop remyelinating 
therapeutics.D.JA and Pa hold equity in Convela Therapeutis, Inc 

and receive consulting income frem Corvelo Therapeutics. Inc. After 
‘esubmision of this Wark, DCF. bacame sn employee of Canvelo 
‘Therapeutis, ne 


‘Additional information 
Extended data avalable fortis paper at htps://detorg/10.1038/s41586- 
o1g-o360-3, 

‘Supplementary information x aviable for this pope at htps//eo\. or’ 
10l1038/=41586-012-0360-3. 

Reprints and permissions information ie sallable a t://www nature.com! 
reprints 

Correspondence and requests for materials shoul be addressed to D1 
Publisher's note: Springer Natur rssine neutral with raged to [uredtional 
claims in published maps and insttubonal affliatans 


1 Springer Nature Lite Alsights reserve 


METHODS 
‘Statistics and reproducibility. No statistical methods were used to predetermine 
sample size. Data were expresed as mean +s and P values were calculated using 
‘unpaired otal Stadent’s est for pairwise comparison of variables with a 
95% confidence interval adn ~ 2 depres of freedom, where isthe otal mimber 
of samples imallgures excep Fig. 4b. In Fg. 4b, P values were calculated using 
‘uv unpaired two-taled Mann-Whitney text with 95% cnnfidence interval nd the 
data ploted asa Tukey box and whisker plot. Boxes indicate the interquartile range. 
and the horizontal line represents the median. Biological replicates: ig en 
wells per condition, except DMSO, »—24; Fig. Ll —17 wells for DMSO, 
forsiContrl and siCYPS1; Fig 1g, 1—=8 wells for DMSO and n= 4 for lanostro: 
Fig. 2b, n= wells per condition, except DMSO, =24: Fig. 2c, =3 wells for 
‘Control and n =4 for sgEBP: Fig 2e-g,=4 well per condition, except 18 
for DMSO and —7 fr ketoconazole in Fig. 2, n— 12 for DMSO in Fig 26, 16 
for DMSO and ketoconazole,» 8 fr cholesterol in Fig 2g. Independent exper 
iments: Fig 2, Fare representative of three and Fig. 2c, gf two independent 
experiments using OPC-5 calls for validation in an independent derivation of 
(OPCs, see Extended Data Figs. 3-5 

Small molecules. The identity and purity of mall molecules were authenticated 
by LC-MS before use (Supplementary Table 2). The ellowing compounds were 
purchased from Sigma-Aldrich as solids: ketoconazole, miconazle,clotimazoe, 
Fluconazole, fulvestrant, ifenprodil, benztropine, bexarotene, tamoxifen, 
‘thydroxytamoxifen, medrosyprogesterone acetate, ospemifene, GSK343, 
‘rans-u50488, methyl-}-cyelodextrin,Socholestan-33-ol and cholesterol. The 
following compounds were purchased from Cayman Chemical as solid: othy- 
ronine; clemastin, AY9944, YMS3601 and Ro-18-4071 The allowing compounds 
\wereabtained from Janssen Pharmaceuticals as solids: 2-methy-ketoconazole, 
Retrans-ketoconazole, and S-frane-ketoconazole, Mevastatin was purchased 
as solid from Selleck Chemicals, The following compounds were purchased 
from Selleck Chemicals 25 10 mM DMSO solutions: bitonazole, utoconazole, 
amorolfine, toremifene, EPZ005687, EPZ6438, UNC1999, hydroxyzine, 
ziprasidone, p-fluorohexahydro-sla-dfenidl (abbreviated in figures as Sigma 
H1127), vesamicol,raloxfene, 1-745,870, TMB-8, pramoxine, arespladib, 
tanshinone-I, levofloxacin, nateglinide, abiraterone, allopurinol, detomidine, 
rivastigmine,3-cartene, BEZ-235, scopolamine, and homatropine. Pienzepine 
and telenzepine were purchased frm Sigma-Aldrich a 1 mM DMSO solutions. 
Cholesterol biosynthetic intermediates were purchased from Avanti Polar 
Lipid as solids: lanosterol, zymosterol, zymostenol, lathosterol, desmosteral, 
7-dehydrodesmosterol,FE-MAS (14-dimethyi-So-cholesta-81424-trien-33-l), 
9-dehydrocholesterol, and 2.2-dimethylzymosterol (2.2-dimethyl-Sa. 
holst 8,24-dien-X-l). 4-Dehydrozymosenol(5a-choesta-8,14-dien-¥-l), 
MAS.412 (4,4-dimethyl-Sa-cholesta-8,14-dien-33-ol), and MAS-414 
(G4-dimethyl-5a-cholesta-8-en-33-ol) were provided by EB, Imidazole 124", 
‘TASIN-I!, TASIN-149", and MGI39 were synthesized a reported, T-MAS 
(,4-dimethyl-50-cholesta-824-dien-3}/-l) from HPLC purification of yeast 
esracs was provided by J. and WW. 

‘Mouse OPC preparation. To rigorously assess the effects of small-molecule and 
genetic treatments on OPC al treatments were assayed in two batches of epbast 
sem cell-derived OPCs, and key results were confirmed using mouse primary 
(OPCs. OPCs were generated rom two separate EpiSC lines, EpiSCS (giving rise 
(OPC-5 OPCs) and 12901 (giving rise to OPC-1 OPCs) Unles otherwise noted, 
resullsin OPC-5 cells ae presented in Fig. 1~4 while results in OPC ate pre 
sented in Extended Dita Figs. 1-10 

EpiSC-derived OPCs were obtained using in vitro differentiation protocals 
and culture conditions described previously To ensure uniformity throughout 
Allin vitro screening experiments, EiSC-derived OPCs were sorted to purity by 
fluorescence activated cell sorting at passage five with conjugated CD 1403-APC 
{eBioscience, 17-140; 1:80) and NG2-AF&88 (Millpore, ABS820/M4 1100) anti 
bodies, Sorted batches of OPCs were expanded and frozen down in aliquot, OPCs 
were thawed into growth conditions for one passage before us in further assays. 
Clases were regulaty tested and shown tobe mycoplasms fe. 

“Toobin mouse primary OPCs, whol brain was removed from postnatal day 2 
pups anaesthetized once, Brains were placed in cold DMEM/F12, andthe cortices 
‘werelsolated andthe meninges were removed, The cortices were manually chopped 
and procesed with the Tumour Dissociation Kit (Mites) and incubated at 37°C for 
{oma Thecell suspension was ered through a 70m fer and centrifuged at 200g 
ford min at room temperature. The cells were washed in DMEM/F12, r-centiged 
and plated in poly- Ornithine and Laminin-teated flasks containing DMEM/F12 
supplemented with N2 Max, B27 (ThermoFisher), 20ng/ml FGF. and 20ng/eal 
PDGE OPCs were passaged once before treatment. Media was changed every 48h. 
Invitro phenotypic screening of OPCs. EpiSC-derived OPCs were grown and 
expanded in poly-ornithine (PO) and lasnin-coated ask with growth medium 
(DMEMVFL2 supplemented with N2-MAX (RSD Sjstems),B-27 (ThermoFisher) 
Glutaax (Gibco), FGF2 10ugim, RAD systems, 233-FB-025) and PDGE-AA 
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(1oyg/ml, R&D systems, 233-AA-050) before harvesting for plating. The cells 
‘were seeded onto poly-D lysine 96-well ellCarier or CllCarsier Ultra plates 
(PerkinElmer) coated wit laminin Sigma, 12020; 15. using multi-channel 
pipet. For the experiment, 800,000 cellsiml stock in differentiation medium 
(DMEM/F12 supplemented with N2-MAX and B-27) was prepared and stored 
fn ice for 2h, Then, 4000 cells were seeded per well in dfferetistion medium 
andallowed to attach fr 30 min before addition of dru, For dose-response esting 
fall molecules except sterol, 21,000 compound stock in dimethyl sulfoxide 
(DMSO) was added to asay plates with 0.1) sold pin mult-blot replicators 
(V &P Scientific; VP 40), esulting ina final primary screening concentration of 
1c, Steals were added to cells as an ethanol solution (0.2% final ethanol concen 
tration), Postve control wells (ketoconazole, 25M) and DMSO vehicle controls 
‘were included in each asay plate. Calls were incubated under standard candi- 
tons (37°C, 5% CO.) for 3 days and fixed with 4% paraformaldehyde (PEA) in 
phosphate buffered saline (PES) for 20 min. Fed plates were washed with PBS 
{200 i per well) twice, permeabilized with 0.1 Triton X-100 and blocked with 1% 
donkey serum (vs) in PBS for 40 min. Then els were labeled with antibodies 
recognizing MBP (Abcam, ab7349; 200) or PLP! (11,000, cle AA3, generously 
provided by B. Trapp, Cleveland Clinic) for Ls h at 4°C followed by detection 
With Alea Fuor conjugated secondary antibodies (1:00) fr 5 min. Nulel were 
‘visualized by DAPI staining (Sigma, I g/ml), During washing steps, PRS was 
‘ded using a multi-channel pipet and aspiration was performed using Biotek 
ELAO6 washer dispenser (Botek) equipped wth a96-wel aspiration manifold. 
High-content imaging and analysis, Plates were imaged on the Operetta 
High Content Imaging and Analysis system (PerkinElmer) and ase of 6 fields 
captured from each well sulting in an aerage of 1,20 ces being scored per well. 
‘Analysis (Perlanklmer Harmony and Columbus software) began by identifying 
intact nucle stained by DAPI; that is, those traced nucls that were larger than 
00 um*in surface area. Each traced nucleus region wasthen expanded by 50% and 
‘ross-referenced with the mature MBP stain to identi aligodendrocyte nace and 
from this the percentage of oligodendrocytes was calculated In ome experiments, 
PLPI staining was performed instead of MBP. or the total proces length of MBP 
ligodendrocyter was calculated as previously described 

(OPCs differentiation and sterol profiling after methyl-)-cyclodextrin 
treatment. EpiSCs derived OPCs harvested fom culture asks were resuspended 
in 10 ml of differentiation medium toa final ell density of 00,000 cells/ml To 
this, cllculture rade water or methyl--cyclodextrin (1 mM) was added and 
incubated at 37°C. After 30 min the cells were washed tice wth diferentiation 
‘medium (5m), and spt into two portions lor diferetiation and sterol profling 
‘The 1,000,000 cells per condition were directly processed as described in 
GC-MS-hased sterol profiling to measure the endogenous sterol levels, For 
dlferentiation, the cells were resuspended in differentiation medium toa final 
cell density of 00,000 celsimal and plated in a PDL/laminin coated 96-well 
CellCarrierUltea plate After 72, the cells were fixed, stained, imaged and 
quantified a desbed above. 

High-throughput screening of 3,000 bioactive small molecules. EiSC- derived 
(OPCs were grown and expanded in poly-ornithine and laminin-costed flasks 
before harvesting fr plating. Cells were dispensed in diferetiation medium 
supplemented with Noggin (RSD Systems 100 ng/ml), Neurotrophin 3 (R&D 
Systems, 10 ng/ml), cAMP (Sigma, 50M), and 1GE-1 (RED Systems; 100 g/m) 
using a Biotek E1406 Microplate Washer Dispenser (Biotek) equipped with Spl 
dispense cassette (Biot), into poly-p-ysine/laminin (Sigma, 12020; 4g/)- 
Coated sterile, 384-wel,CllCarrier ultra plates (PerkinElmer), toa final density 
of 12,500 cells per well and allowed to attach for 4S min befor addition of drug, A 
'3mM stock of bioactive compound library in dimethysulphoxide (DMSO) were 
prepated inan Abgene storage 34-wel plate (ThermoFisher Scenic; ABIOS5). 
‘These were added to assay plates using 350 al solid pin tool attached to Janus 
automated workstation (Perkin Elmer), resulting in final screening concentration 
ff 24M, Cells were incubated at 37°C for | hand then 3 (Sigma; 40 ng/ml) vas 
added toall wells except negative contrls to which EGF 20 ng/ml) was added 
instead. Negative controls and T3-alone were included in ech assay plate. After 
incubation at 37°C for 72, cells were fixed, washed and stained similarly tothe 
96-well OPCassay protocol, although all the washing steps wer performed using 
4 Blotek ELAN Microplate Washer Dispenser (Bioek) equipped with a 96-well 
aspiration manifold. Cells were stained with DAPI (Sigma: lyg/ml) and MBP 
antibody (Abcam, ab7349; 1-100). Plates were imaged on the Operetta High 
Content Imaging and Analysis system (PerkinElmer) anda st of fields capeured 
fiom each well resulting in an average of 700 cells being scored per well. Analysis 
was performed asin High-Content Imaging and Analysis, above. All plates for 
the primary screen were processed and analysed simultaneously to minimize 
‘variably. Molecules causing more than 20% reduction in nuclear count relative 
to DMSO control wells were removed ftom consideration, and hits were called 
fon the basis of largest fld-increase in percentage of MBP oligodendrocytes 
felatve to DMSO controle within the same pate, When selstng the leading hts or 
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further experiments molecules obtained in previous screens were omitted, including 
Smidazole antifungals and clemastne 

GC-MS-based sterol profiling, EpiSC-derived OPCs were plated at 05 milion 
‘elle per lin PDL-andlaminin-costed six or ewslve well plat with difrentiation 
‘media. After 24h, cll were dissociated with Accutase, rinsed with saline, and 
‘ell pellets were frozen, For sterol analyses, cells were ysed in methanol (Sigs 
Aldrich) with agitation for 30 min and cell debris removed by centrifugation 
311,000 rpm for 15 min. Cholesterol-d7standaed (2526,262627.27.27°H,- 
cholestrol, Cambridge Isotope Laboratories) was added before drying under nito- 
igen stream and derivatization with 55, of bis trimethysy)eifluoroacetamide) 
{rimethychlorosiane to form trimethyl derivative. Follosing derivatization 
1 60°C for 20 min, I was analysed by GC-MS using an Agilent 5973 Network 
‘Mas Selective Detector equipped with a 6890 gas chromatograph system and a 
[HP-5MS capillary column (60 m > 0.25 mm > 0.25yum).Samples were injected 
Jn splitess mode and analysed using electron iypact ionization, Ion fragment 
oaks were integrated to calculate sterol abundance, and quantitation was relative 
te cholesterol-d7. The flloeing m/z ion fragments were used to quantitate each 
‘metabolite: cholesterol-d7 (465), FF-Mas (482), cholesterol (368), zymostenol 
(458), zymosterol (456), desmmosterl (456,343), 7-dehydrochoestral (456,35). 
Lanostral (383), athostrl (458), 1-dehydrozymostenl (456). Calibration curves 
‘were generated by injecting varying concentrations of trol standards and main- 
taining a fied amount of cholesterl-d7. The human ghoma cel line GHMS28 
was gift Jeremy Rich (Cleveland Clinic). These els were validated as unique 
by STR profiling 

1LC-MS-based sterol profiling, Sterols were extracted aftr treatment of OPC-5 
(OPCs with ketoconazole as described in GC-MS-based sterol profiling above. 
Picolinate derivatization chromatographic separation, and mass spectrometric 
detection were performed as reported previously”, Peaks from selective reaction 
‘onitaring were integrated to calculate sterol abundance, and quantitation vis 
relative to cholesterol d7. 

Human cortical spheroids. Human cortical spheroids were generated as described 
previously with modifications ta enable the incon and differentiation of OPC=™ 
Inbyiet, spheroids were treated with miconazoe or enprodl (2M) from days 
62-72and asayed on day 93 for MyRE* oligodendrocytes rabbit ant-MyRE ant- 
body was generously provided by M. Wegner and used at 1:1 00) 

CCYPS1 enzymaticassay, CYPSI enzymatic activity was measured using a eported 
method with slight modifications rat CYP51 (Cypex, lnc. was used as enzyme; 
‘action volume was $0): reation ime was 40 minlanosterol concentration was 
S0y.Mand reactions wete quenched with $0) isopropanal, Finally, 15jof ech 
"eaction/sopropanel mixture was injected ont a SCIEX Triple Quad 63002.C MS) 
DMSsystem sing an APCI ion source in postive ion mode with Shimadzu UFLC- 
20AD HPLC and a Phenomenex Kinetix CISXB 50 > 2.1 x 2. column at 40°C. 
ERP enzymaticassay. ENP enzymatic activity was measured using a reported 
‘method with light modifications” active EBP was obtained from mouse micro- 
Somes, inhibitors were added, ymostenol was added ata final concentration of 
25 uM ina final reaction volume of 500), and the eacton incubated at 37°C 
for 2h, Sterols were extracted using 3 > Lm hexanes, cholesteral-d7 was added 
te-enable quantitation, and the pooled organics wee dried (NaSO,) and evapo- 
‘ated under nitrogen ges. Samples were then lated and analyse using GC/MS 
asdescribed above. 

"RNA treatments. Cell- permeable siRNAs were obtained as pools of 4 individ- 
ual siRNAs targeting mouse CYP51, ora nos-taryeting contol (Accel siRNAs, 
Dharmacon. Pooled CYPS1 siRNA sequence: GUCUGUUUUGAGAUUAGU: 
(CGACUAUGCUUCGUUUAUA; CCCUGCUCUUCAAUAGUAA: CUAUUAAG 
UUAUUGUGAAC. Noa-targeting contrl siRNA: UGGUUUACAUGUCK 
ACUAA). For differentiation analysis, cells were plated in a 96-well plate (as 
Aetaled above) and treated with 1).M pooled siRNA suspended in RNase free 
‘water diluted in ferentiation media as detailed above). For sterol analysis cells 
‘were plated in a six-well plat st 300000 cells per well standard diferetiation 
‘media supplemented with PDGF (RAD Systems, 20 ng/ml), neurotrophin 3 (R&D 
Systems 10 ng/ml), AMP (Sigma: 50M), IGE-1 (RAD Systems; 100 ng/ml) 
‘noggin (RED Systems: 100 ng/ml). At 24h, [yMESIRNA was added tothe media 
Cells were grown for three more days in siRNA containing mea with growth 
factor supplementation every 48h, before harvesting and processing for GC-MS 
snalyis as detailed above, 

CRISPR-Cas9-mediated targeting of ERP, Guide RNA sequences were 
obtained using the Broad Brie brary and manufactured by IDT. Nucleotide 
Sequences (sgRNA sequence: GAAACGCAATCACTACCCAT (sgEBP) 
GUGGCCTAATTGTGATCACG (sgE8P2) were prepared and inserted into the 
‘entiCRISPR2 plasmid (Addgene 5296) sing the instructions from Geckol beary 
preparation: in bi, Fastdigest Bsbm (ferment) was use for plasmid digestion, 
‘TAPNK (NEB M02015) fr nuletide annealing, and QuiceLigase (NEB M2200) 
forsgRNA insertion Insertion was confirmed by Sanger sequencing, Hek293T call 
were ansfected using Lenti-xshots as per the manufacturer protocol (Clontech). 


Afer24 the media was changeto OPC media for collection of views, 48h ater the 
‘media was collected, supplemented with FGE PDGE and protamine sulfate (Sigma, 
‘g/m, and used to transduce OPC 24h later the media was changed to non-virus 
‘containing media for 4%, Cells underwent two 48h stretches of puromycin selec: 
tion (Invitrogen). After 24 of recovery in non-slection media, cells wer plated 
for difereniation, GC-MS, and gPCR as described above 

Focal demyelination, drug treatment and histological analysis, Focal emelina- 
‘on inthe dorsal column a the spinal cord was induced bythe injection of Ls LPC 
solution. 12 week old C37B/6 female mice were anaesthetized using isoflurane 
‘nd T10 laminectomies were performed. lof 1% LPC wat infused int the dorsal 
column ata rate of 15,l/h. At day 4 animals were randomized into treatment 
groups before treatment (2 animals were excluded due to surgical complications). 
Between days 4and 11 post laminectomy, animal received dally injections of 
cither vehicle or drug intraperitoneal, Drugs were dissalved in DMSO or corn 
oil and then diluted with trie saline fr injections such that final doses were 
2 mg/kg for tamoxifen and 10 mgikg for fenprodil. This experiment was dane in 
‘blinded manner: compounds were coded to ensure the researchers performing 
‘he experiments were unawarecf the treatment being adasnistred tn each una 
Allanimals were euthanized 12 days post laminectomy (n=4-6 per group). Mice 
‘were anaesthetized sing ketamine/xylazine radentcncktal and then euthanized 
by anscadil perfusion with 4% PEA, 2 glutaraldehyde, and0.1 M sodium caco 

late, Samples were osmicated, stained en bloc with uranyl acetate and embedded 
In EMbed 812, an Epon-812 substitute (EMS). I um sections were cut and stained 
‘ith toluidine blue and visualized on a light microscope (Leica DMS5008). The 
‘numberof myelinated axons per unt area was counted from sections abained 
from the middle of each lesion and then averaged over each treatment group, 
Allsections within the lesion area were scored (vehicle, 10 sections tamoxifen, 
1 sections; fenprodil, 28 sections). A Mann-Whitney statistical analysis was 
performed to assess statistical significance. 

‘Analysis of mouse brain sterol levels. Ten to twelve week old male C37BL/6 
mice were injected with 2mg/kg tamoxifen, 10 mg/kg ifenprodil oF 10 mg/kg 
‘miconazole dissolved incor ol (tamoxifen) or DMSO (ifenprodil, miconazcle) 
In sterile saline dally for three days, Mice were anaesthetized with isoflurane and 
perfsed with phosphate buflered saline to remove blood from the bra. Brains 
‘were collected and lash frozen using igui mitogen. The samples were pulverized 
sd 50-100 mg of tissue were collected for further processing. A modified Folch 
protocol was used for extraction of sterol” Briefly samples were esuspended in 
£21 chloroform/methanol mixture and homogenized Cell debris was removed 
by centrifugation at 4000 for 10min. The olution was dred under air and resus- 
pended in hexane with a cholesterol-d7 standard and dried again. Lipids were 
erivatized with 70) of bs(trimethylsitrfloroacetamide were injected 
‘and analysed by GC-MSas described above. 

(Ocstrogen-dependent cll proliferation assay. Oestogen- dependent el roifea- 
tion was measred ss previ described with minor modifications, After growth 
{in oestogen-fee media (Phenol e-ree RPMI supplemented with 10% charcoal 
stripped fetal bovine serum) for 5 days, calls were seeded at 2.500 cllswell into 96 
‘well plates. The fllring day 3 drug containing media was added to triplicate 
twellsund cells were allowed to graw fran addtional S days at 37°C in standard 
53% CO; humidified incubator. Total DNA per well was measured using an adap 

tation ofthe method of Labarca and Paigen™, At this ime media was removed, 
cells were washed one time with 0.25% PRS and 100) of distilled water was 
fulded. Plates were frozen and thawed to enhance cll ysis and 200) of 10)g/ml 
Hoechst 33258 (Sigma-Aldrich, St. Louis, MO. in 2 M NaCl, I mM EDTA, 
Wm Tre HCL pH 7. was sed. After incubation at room temperature for 2h, 
plates were read ina SpectraMax i Nuorescent plate reader (Molecular Devices, 
Sunnyvale, CA) with excitation t 360 nm and emission at 460 nm. Allvalues were 
‘converted to microgram DNA per well using standard curve derived rom puti- 
fied salmon testes DNA. TA7D celle were provided by the Translational Research 
‘Shared Resource af the Case Comprehensive Cancer Center and sed without 
further authentication beyond the observed estogen- dependent cell proliferation, 
Oligodendrocyte formation and imaging on electrospun microfibres. A 
12-well plate containing Mimetex aligned scaffold (microfibre plate, AMSBIO, 
‘ANS TECL-006-1X, Flecrospun poly-L-lactide cao, 2M fre diameter cell 
‘crown inserts) was prepared as previously described. tn bre, lbre inert were 
‘telized with 70% ethanol and washed with PBS before being coated with poly- 
fomithine and laminin. After laminin coating, 100,000 cells! of EpiSC-devived 
‘OPCs (1.5 mlfwel) were plated in difeentiation medium. After 4h the media 
‘was replaced with fresh media containing small molecule treatments, Every 485 
the media was replaced with fresh compound containing media fora total of 
14.days. Plates were fixed with 4% PEA, permeabilized with 0.1% Teton X-10, and 
blocked with 10% donkey serum (v/¥ in PBS for 60 min. Plates were stained for 
[MBP (Abcam, ab7349; 1100) and DAPI staining (Sigma: 5 yim). After staining, 
the inserts were moved into new 12-wel plate an covered with ? ml of PBS before 
‘maging in Opercta high content Imaging an analysis system, Plates were imaged 
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on the Opereta High Content Imaging and Analysis system (PerkinElmer) and 
‘ret of fields captured from each well resulting in an average of 45,000 cells 
being scored per well. Analysis (PerkinElmer Harmony and Columbus software) 
identified intact nucei stained by DAPI and calculated the MBP signal inten 
sity per cell pr well. Microfibre inset tracking images were taken using a Leica 
DMG with 20> dry/NA 0.40 objective. Microfibre plate inserts were mounted 
using louromount-G (SouthernBiotech) and allowed to partially harden before 
‘coverslips were added and the insert ring was removed. Confocal images were 
obtained on Leica SPS confocal scanning microscope, with 40x oillNA 1.30 
jective. Confocal stacks of 0.336 um z-stps were taken at 1,024» 1,024. Each 
fluorophore was excited sequentially nd ll contrst and brightness changes were 
applied consistently between images. 

1 separate analysis approach was performed on an independent experiment 
performed as above except the small-molecule treatment was limited tothe fst 
4 days of the 14 day culture period, After staining, the fibre inserts were 
‘mounted ona glas slide (Fisherbrand Superizost Plas Microscope Slides) using 
Fluprmount- (Southern Biotech) with a cover las (Fisherbrand Microscope 
Cover Glass) and dried at RT in dak for 3h. The mounted inserts were imaged 
on the Operetta High Content Imaging and Analysissystem (PerkinElmer) anda 
‘set of22 elds captured from each condition resulting in an average of 2.000 cells 
being scored per well The total microfibre area was calculated using bright filed 
Imaging and aspot-finding function (area larger than 2 pines) The MBP + pixel 
ae within the defined microfibre area was then defined and the percentage ofthe 
total microfibre ates calculated. 

(CYPS1 gPCR. Cells wer plated a $00,000 cel perwelln a six-well plate and were 
row in standard differentiation media supplemented with PDGE neurotrophin 
5 cAMP. IGF-1, and noggin for fou days as described above. At 24h, cells were 
tweed with nM RNA. Growth factors wereadded every 48>. After thre days 
of siRNA treatment, RNA was isolated withthe RNeasy Mini Kit (Qiagen), and 
{DNA was made using High-Capacity RNA-to-cDNA Kit (Applied Biosystems) 
Exon spanning primers for ActinB (Thermo-Fisher, Taqman, Mm02610580_g1) 
and CYPS1 (Thermo-Fisher,Tagman, Mn00490968_m1) were used for detection 
bf rative RNA levels by quantitative realtime PCR (Applied Biosystems, 7300 
Realtime PCR system). Cyl time and outliers were calelated using Applied 
Biosystems 7300 System Sequence Detection Software version 1, 

EBP qPCR, OPCs were sccutased, 1 million cells per cell line were spun davin 
and RNA was isolated with the RNeasy Mint Kit (Qiagen). DNA was removed 
‘sing DNAse (Invitrogen, and DNA was made sing High-Capacty RNA-to-cDNA 
Kit (Applied Biosystems). Primers for exon 5 of EBP (forward primer: TGTGC 
GAGGAGGAAGAAGAT, reverse primer: GATAGGCCACCCCGTTTATT) and 
GAPDH (forward primer: AGGTCGGTGTGAACGGATTTG: reverse primer: 
GGGGTCGTTGATGGCAACA) were manufactured by IDT and gene expression 
vas assessed using Power SYR Green Master Mix Applied Biosystems) were wsed 
for detection of lative RNA levels by quantitative real tine PCR (uantstudio 
7 flex «ystem). Cycle time and outliers were calculated using QuantStudio 
Software V1.3. 

‘Muscarinic receptor antagonism assay. Genel Azer M-NEAT-bla CHO-KI 
calls (or M3-or M5-NEAT-bla CHO-KI ces) (ThermoFisher) were thawed into 
Assay Media (DMEM, 10% dialysed FBS, 5 mM HEPES pH 7.3,0.1 mM NEAA), 
10,000 cells/well were added toa 384-well TC tretedastay plate and incubated 
16-24 a 37°C. 44d fa 10 stock of antimuscarinic molecules was added to 
the plate and incubated 30 min of 10 control agonist Carbachol at the pre 
determined ECSO concentration was added to wels containing antimuscarinic 
‘molecules. The plate was incubated 5h and 8, of 1M substrate + solution 
loading solution was added to each wll and the plate was incubated 2h at room 
temperature before reading on a uorescence plate reader. This call line was val: 
‘ated in each run onthe basisof 2 > 05 for cisbachol versus contro treatment. 
'SRERP qPCR. Cells were plated att million cll per wll na six-well plate and 
ere grown in standard differentiation media supplemented with with DMSO, 
‘mevasatin (25M), Ro 48-8071 (500 nM), ketoconazole (2.5pM), TASIN-1 
(100 aN), or amorelfine (100 nM). At 24h, RNA was isolated with the RNeasy 
“Mini Kit (Qiagen), and cDNA was made using High-Capacity RNA-to-<DNA Kit 
(Applied Biosystems). Exon spanning primers Actin (Thermo-Fishe, Tagman, 
'Mm02619580_g1), 185 (Thermo-Fisher, Taqman, Mm0O461312_m!), LDLR 
(Theemo-Fisher, Taqman, Mm01177349_mil), and DHCR7 (Thermo-Fisher, 
“Tagaman, Mm00514571_m1) were used for detection of relative RNA levels by 
{quantitative realtime PCR (Applied Biosystems, 730 Realtime PCR system). Cyle 
time and outers were calculated using Applied Biosystems 7300 System Sequence 
Detection Softwste version 1 

‘NR2C2 and NR2FI luciferase assays. Forty-eight hours before transfection, 
100,000 Fek293T clls were plated per well na 24 well plate, HEK293T clls were 
chosen because they were sed previously in this assay sd aot validated further”. 
NR2C2 (Origene, Mi221079) or NR2FI (git fom Schaaf) and NGF promoter 
reporter plasmid (git rom C Schaef) wee transfected using Lipofectamine 2000 
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(Thermo Fisher, 11668027) as pr the manufacturers protocol. Aer 16, Hek293 
cell were treated with the compounds (2.2-imethyl-rymosterol 5M, FE-MAS 
104M, ketoconazole 25jM, TASIN-1 100 aM, mevastatin 25M, othyronine 
34M and all-irns retinoic acid 5M). 32h later cells were lysed using firefly 
luciterase assay system (Promega, E1500) and readout using Synergy Neo2 High 
Performance plate reader, 

NNaclear receptor profiling, Luciferase reporter assays performed by Indigo 
Biosciences were used to asses interaction of 2,2-dimethylzymosterol (5M), 
ketoconazole (2.5pM),and TASIN-1 (250 nM) with human ER, GR, LXR, 
NFB, NRF2, PGR, PPARo, PPAR, RARO, RARY,RXRo, RXR, Tho, TR and 
‘VDR in agonist mode and ERRo, ROR0 and ROR’ in inverse agonist mode. The 
reporter for these assay is relly laciferase linked with ether the genetic response 
tlements (GRE) o the Galt upsteam activation sequence (UAS)- These cells also 
expres either the native receptor ora receptor in which the native N-terminal 
DNA binding domain (DBD) ha been replaced with that of the yeast Gald DBD. 
“The specifics ofeach assay ate shown in the table belo In rita suspension of 
reporter cells was prepared in cell recovery medium (CRM containing 3% (ROR) 
for 10% charcoal stripped FBS for others) 100) of the reporter cell suxpension 
was dispensed into well ofa white 96-well asa pat. Test compound, reference 
compounds, and the respective vehicle were diluted into INDIGO compound 
Screening medium (CSM: containing 5% (ROR) o 10% charcoal stripped FBS for 
thes) 100) ofeach treatment medium was dispensed into duplicate assay wells 
pre-dispensed with reporter cells Assay plates were incubated at 37°C for 24h, 
Following the incubation period, or agonist and inverse-agois assays, treatment 
media were discarded and 1004/well f luciferase detection reagent was added. 
[RLU were quantified from each asaywellto determine agonist or inverse-agonist 
activity using the following assay desig: 

Ra (NR&AI): native receptor ER GRE luciferase 

ERRa (NR3BI); Gal DBD hybrid eceptor, Galt UAS luciferase 

GR (NRSC); native receptor; GR GRE luciferase 

1EXRS (NRIH); Gald DBD hybrid receptor: Gald UAS luciferase 

PGR (NR3C3);native receptor, PGR GRE lucirase 

PPAR® (NRIC2); Gald DBD hybrid receptor; Gald UAS- luciferase 

PPAR (NRIC3); Gald DBD hybrid receptor; Gald UAS luciferase 

RARa (NRIB1};Gald DBD hybrid receptor; Galt UAS- luciferase 

RAR; (NRIB3): Galt DBD hybrid ecepto, ald UAS-luierase 

RORa (NRIFI};Gald DBD hybrid receptor: Gald UAS luciferase 

ROR (NRIF3): Galt DBD hybrid receptor Galt UAS luciferase 

[RXRa (ROB); Gald DBD hybrid receptor: Gal4 UAS luciferase 

RXR) (NFOB2); Galt DBD hybrid receptor Galt UAS luciferase 

“TR (NRIAL); Gala DBD hybrid receptor; Gals UAS luciferase 

‘TR (NRLA2};Gald DBD hybrid receptor: Gal UAS luciferase 

DR (NRII1); Galt DBD hybrid ceptor, Gald UAS luciferase 

NF 4B:native NEB: NFB GRE luciferase 

NRF2snatve receptor; ARE-Jocierase 

‘Animal welfare, Al nim experiments were performed in accardance with pro- 
tocols approved by the Case Western Reserve Univesity and George Washington 
University Insttational Animal Care and Use Committees 

Reporting summary, Further information on experimental design is syalablein 
the Nature Reseach Reporting Summary linked to this pape. 

Data availablity, The data supporting the findings ofthis study are avaable 
within the paper (ands Supplementary Information) or from the corresponding 
tuthor upon request Source Data forall GC-MS-based sterol profiling experi 
‘meats and animal experiments are provided with the paper. 
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Extended Data Fig. 1 | Expanded cholesterol synthesis pathway 
diagram. The cascade cyclization of squalene epoxide, catalysed by 
lanosteral synthase (LSS) provides the frst sterol, anostero. Processing 
‘of lanosterl to cholesterol can proceed via the Kandutsch-Russell 
and/or Bloch pathways, which use the same enzymes and process 
substrates that vary only in the presence or absence of the C24 double 
bond, Intermediates in blue have been confirmed in our GC-MS-based 


emt ff 


desmosterot 
sterol profiling assay using authentic standards Sterol 14-reductase 
Activity in mouse fs shared by to genes, TM7SE2 and LBR. Consistent 
with past reports, inhibition of sterol I4-reductae activity can lead 
to accumulation ofthe expected upstream intermediate (FF-MAS) or 
‘Mdchydrozymostenol, also known as cholesta-8,14-dien-3-3-ol. Green 
indicates enzyme targets and small molecules whose inhibition promotes 
aligodendocyt formation 
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Extended Data Fig. 2 | See nest page for caption. 
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Extended Data Fig. 2 | CYPS1 is the functional target by which 
imidazole antifungals enhance oligodendrocyte formation. a, Azole 
molecules with varying degrees of potency for mammalian CYPS1 
inhibition, Throughout, geen Ibels indicate molecules considered active, 
‘hile red labels indicate inactive molecules b, Percentage of MBP 
oligodendrocytes generated from a second independent derivation 
of OPCs (OPC-1) at 72h following treatment with th indicated 
concentrations of azoles.n=4 wells per condition except DMSO ( 
‘with >1,000 cells analysed per well. GC-MS-based quantification 
bf lanosterol levels ina second derivation of OPCs (OPC-1) treated 
for 24h wit the indicated azoles at 2.5j.M. n =2 wells per condition, 
de, GC-MS-based quantification of cholesterol levels in OPCs (OPC-5 

) treated for 24 h with the indicated azoles t 2.5 .M.n 2 
els per condition. fg, GC-MS-based quantification of lanosterol 
levels in OPCs (OPC-5, OPC- 1) treated for 24h with the indicated 
doses of ketoconazole. n= 2 wells per condition, Concentrations shown 
in fand g mirror those shown in band Fig. 1c. , Percentage of MBP" 
oligodendrocytes generated from mouse primary OPCs at 72h fllowing 
treatment with the indicated imidazole antifungals at 3M. n 4 wells, 
per condition, with > 1,000 cells analysed per well, GU-MS-based 
‘quantification of lanostral levels in mouse primary OPCs treated for 24h 
swith the indicated imidazole antifungals at 3M. n 2 wells per condition, 
j, Assessment of oligodendrocyte formation using an alternative image 
‘quantification metric, fold increase in total neurite length, Re-analysis of 
data shown in Fig, 1c. =4 wells per condition except DMSO (n= 24), 


2), 


‘with >1,000 cells analysed per well, Percentage of oligodendrocytes 
{generated from OPCs at 72 fallowing treatment with ketoconazole 
(2.5yM) as measured by PLPL immunostaining. Left, OPC-5; right, 
‘OPC-1.n=8 wells per condition, with > 1,000 cells analysed per well 

1, LC-MS-based quantification of lanostero levels in OPC-5 cells treated 
for 24 h with ketoconazole at 25M. =2 wells per condition. m, CYPSI 
‘mRNA levels measured by RT-qPCR following 96-h treatment with 
nhon-targeting or CYP I-targeting pools of cell-permeable siRNAs.» ~2 
‘wells per condition, n, GC-MS-hased quantification of lanosterl levels 
in OPC-1 cells treated for 96h with the indicated pooled siRNA reagents 
n=2 wells per condition. o, Percentage of MBP’ oligodendrocytes 
generated from a second, independent batch of OPCs (OPC-1) at 

72's following treatment with the indicated reagents, n—3 wells per 
condition, with > 1,000 cells analysed per well. p Percentage of MBP 
‘oligodendrocytes generated from an independent derivation of OPCs 
172 h following treatient with exogenous anosterol.n—4 wells per 
condition except DMSO and ketoconazole (=8), with >1,000 cells 
analysed per well. q Representative images of OPCS cells treated for 72 
‘withthe indicated siRNA reagents and lanosterol, Nuclei are labelled with 
DAPI (blue), and oligodendrocytes are indicated by immunostaining 

for MBP (green), Scale bar, 100). All bar graphs indicate mean +s 
bb dh ik 1 o and pare representative oftwo independent experiments 
and all indings have been confirmed in a second independent derivation 
of OPCS (Fig. 
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Extended Data Fig. 3| Effect of small-molecule inhibition of the 
cholesterol biosynthesis pathway on enhancing oligodendrocyte 
formation. a, GC-MS-based quantification of sterl levelsin OPCs (OPC-5) 
treated for 24h with the indicated inhibitors of cholesterol biosynthesis. 
Left, cholesterol ight, dermosterol, n= 2 wells per condition Inhibitors 
‘were used at the fallowing doses unless otherwise noted: mevasatin, 
ketoconazole, MGI-39, 2.5}.M; YMS3601, 24M: Ro 48-8071, amorolfine, 
TASIN-1, 100 nM; AY9944, 200 nM. b, GC-MS-based quantification of 
sterol levels in a second derivation of OPCs (OPC-1), Left, cholesterol; 
right, desmosterol,n—2 wells per condition, c, GC-MS-based 
{quantification ofthe sterol intermediates expected to accumulate following 
{treatment of OPCs withthe indicated inhibitors of cholesterol biosynthesis 
for 24, n=2 wells per condition, d, GC-MS-based quantification of 
the sterol intermediates expected to accumulate ollosing treatment of 
a second derivation of OPCs (OPC-1) withthe indicated inhibitors of 
cholesterol biosynthesis for 24h. n =2 wells per condition. In cand d, 
‘no accumulation of other sterol intermediates indicative of off-target 
effects within the cholesterol pathway were observed (see Source Data). 
¢, Representative images af OPC- cells treated for 72h withthe indicat 
‘mall molecules, All treatments are atthe highest concentration shovn 


in Fig. 2b. Scale bar, 100m. f, Percentage of MBP* oligodendrocytes, 
‘generated from a second batch of OPCs (OPC-1) at 72h following 
treatment with the indicated cholesterol pathway inhibitors, —4 wells 
‘per condition, except DMSO, = 24, with >1,000 cells analysed per well 
§ Percentage of MBP oligodendrocytes generated from mouse primary 
Gps at 72 following treatment with the indicated cholesterol pathway 
Inhibitors at 300 nM. n =4 wells pr condition, except DMSO, n= 8, 

‘with >1,000 cells analysed per well. h, GC-MS-based quantification 

‘of terol intermediate levels in mouse primary OPCs treated for 24 
‘withthe indicated inhibitors of cholesterol biosynthesis at 300 nM. Left 
|H4-dehydrozymostenol levels following treatment with amorolfine: right, 
‘2ymostenol level following treatment with TASIN-1. n—2 wells per 
condition. i,j, GC-MS-based quantification of sterol intermediate levels in 
‘OPC-5 (and OPC-1 Gj cells treated for 24h with the indicated doses of 
inhibitors of cholesterol biosynthesis. Left, 4-dehydrozymostenol levels 
following treatment with amorolfine: right, zymostenol levels following 
treatment with TASIN-1, n—2 wells per condition, Concentrations shown 
‘ni mirror those showen in All har graphs indicate mean +s, and 

4, e-hare representative of two independent experiments, 
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Extended Data Fig. | See next page for caption. 
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Extended Data Fig. 4| Effect of independent chemical-genctic 
and genetic modulators of CYPS1, sterol 14 reductase and EBP 
‘on oligodendrocyte formation and cholesterol biosynthesis. 

1g, Percentage of MAP" oligodendrocytes generated from to 
independent derivation of OPCs at 72 following treatment with the 
Indicated concentrations of medroxyprogesterone acetate (a),2-metbyl 
ketoconazole (d) or TASIN-449 (g)-=4 wells per condition, except 
DMSO, n= 12 ina, d. Ing, for OPC-5, n= 4 except DMSO, 7; for 
OPC-1, 1=3 except DMSO, n =6.b, ¢h, GC-MS-based quantification 
‘of sterol level in two independent derivations of OPCs treated for 24 
with medroxyprogesterone acetate at 10M (b) 2-methyl ketoconazole 
at 25M () and TASIN-449 atthe indicated concentrations (h). 
wells per condition. ,f, Rat CYP51 enzymatic activity following treatment 
with varying concentrations of medroxyprogesterone acetate (c) and 
2-methyl ketoconazole (f) as measured by LC-MS-based quantification 


‘ofthe CYPSI product FF-MAS. n=2 independent enzymatic assays 
i, Percentage of MRP" oligodendrocytes generated from OPCs (OPC: 

5) infected with lentivirus expressing Cas9 and an independent guide 
RNA targeting EBP (see also Fig. 2c). Fight wells per condition, with 

> 1,000 cells analysed per well Two-tailed Student’ t-test, *P = 0.0009 
j, Functional validation of CRISPR-based targeting of EBP witha second 
SsgRNA using GC-MS-based quantification of zymostenol levels. n—2 
wells per condition, k, EBP mRNA levels measured by RI-qPCR in OPCs 
(OPC-5) infected with lentivirus expressing Cas9 and either of two guide 
RNAs targeting EBP. One wel per condition, with results validated in an 
independent experiment. I, Representative images ofthe oligodendrocyte 
formation assay shown in Fig c. Nuclei are labelled with DAPI (blue), 
and oligodendrocytes are indicated by immunostaining for MBP (green). 
Seale bar, 100 um. All bar graphs indicate mean sd. and a, dg, i, keare 
‘representative of two independent experiments. 
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Extended Data Fig. 5 | Effect of 8,9-unsaturated sterolson 
oligodendrocyte formation. a, Percentage of MBP" oligodendrocytes 
generated from OPCs (OPC-5) at 72 following treatment with methyl 
‘cyclodextrin (1 mM) for 30 min at 37°C. n=8 vells per condition, 

with >1,000 cells analysed per wellb, GC-MS-based quantification of 
colesteral (leit) and desmosterol (right) in OPCs (OPC-5) treated with 
:methyl8-cyclodextrin (Me--CD) at | mM or ketoconazole at 2.5j.M. 

2 wells per condition c,d, Percentage of MBP" oligodendrocytes 
generated from OPC-1 (e) and OPC-5 cells (d) at 72h following treatment 
With the indicated purified sterol intermediates. n= 4 wells per condition, 
except n—§ for DMSO and ketoconazole, with > 1,000 cells analysed per 
‘well, Gren text highlights metabolites that accurate after treatments 
that enhance oligodendrocyte formation (Fi, 2e, Extended Data Fig 3) 
«Percentage of MBP” oligodendrocytes generated from OPC1 cells at 
72 hollowing treatment with MAS-112 and MAS-414, n—4 wells per 
condition, with >1,000 cells analysed per well f, Representative images 
‘of OPCS cells treated for 72 with DMSO, MAS-412, or MAS-414 
(GyMD. Nuclet are labelled with DAPI (blue), and oligodendrocytes 
are indicated by immunostaining for MBP (green). Scale bar, 100m. 
fg. Percentage of MUP" oligodendrocytes generated from OPC-1 at 72, 
following treatment with 2.2-dimethyl-zymosterol. n= 4 wells per 
contin except [MSO ( — 12) with > 1000 cells analysed per well.h, 
Representative images af OPC-S cells treated for 72h with vehicle and 
2,2-dimethyl-zymosterol (25M). Nuclei are labelled with DAPI (blue), 
and oligodendrocytes are indicated by immunostaining for MBP (green), 
Scale bar, 100}1m. i, Percentage of MBP’ oligodendrocytes generated 
from OPC-5 (left) and OPC-1 (right) cells at 72h following treatment 
with PF-MAS or T-MAS. 1-—4 wells per condition except DMSO and 
ketoconazole (=), with >1,000 cells analysed per well.j, Percentage of 
MIP" oligodendrocytes generated from OPC-5 and OPG-1 OPCs at 72h 
following teatment with the indicated concentrations of cholesterol. n= 8 
wells per condition, with > 1,000 cells analysed per well k, Percentage 
‘of MAP" oligodendracytes generated fom OPC-$ and OPC-1 cells at 
72h following treatment with the indicated concentrations of sterols that 
are structurally identical aside fom the presence or absence of the 89 
{double bond (structures ino). n> 3 wells per condition (see dot plots as 


replicate values vary by condition), with >1,000 cells analysed per well 
1m, Percentage of MBP’ oligodendrocytes generated from OPCs (OPC-5) 
at 72 following treatment with the indicated small molecules or 
‘combinations of small molecules (ketoconazole, 2.5j.M; Ro 48-8071, 11 
1AM; Liothyronine, 34M). n =3 wells per condition, except DMSO n= 11 
kketaconazole n~ 13, liothyronine n~ 8 & lithyronine + Ro 48-8071 
1=4, with >1,000 cells analysed per well n, GC-MS-based quantification 
‘of anosterol levels in OPCs (OPC-5) treated for 24h with the indicated 
‘mall molecules or combinations of small molecules a concentrations 
stated in m,n =2 wells per condition 0 Structures ofzymostenel, 
'59-dehydrocholestero, Sa-cholestanol, and cholesterol. p, Total cell 
number as measured by counting of DAPI” nucle in the experiment 
presented in m. qr, Percentage of MBP" oligodendrocytes generated from 
OPCs (OPCS and OPC-1) at 72 following treatment withthe indicated 
‘small molecules or combinations of small molecules in twa independent 
hatches of OPCs (ketoconazole, 2.5.M; MAS412, 5M). In 
DMSO, 8 for ketoconazole, and 4 for remaining bars. In, 

per conditions, Luciferase reporter assays were used to assess whether 
2.2-dimethylzymostero (5j.M), ketoconazole (25M), and TASIN-1 
(250 nM) modulate human ERa, GR, LXR3, NFEB, NRF2, PGR, PPARS, 
PPAR», RARo, RAR, RXRa, RXR), TRo, TR} and VDR transcriptional 
activity in agonist mode and ERRa, RORA and ROR} in inverse-agonist 
mode. n 2 wells per condition and »-—3 wells per postive control 
condition. , Hlfects of sterols (2.2-dimethylzymostetol 5M, FF-MAS 
10M) and small molecules (ketoconazole 2.5j.M, TASIN-1 100 nM) on 
the NR2FI-medisted activation af a NGFL-A promoter driven luciferase 
‘reporter. 1=2 wells per condition. u, Effects of2.2-dimethylzymosteral 
(5uM) on NR2C2-mediated activation ofa NGFI-A promoter driven 
luciferase reporter in comparison to cells transected with reporter only, 
untreated, or treated with a previously reported positive control (all-trans 
retinoic acid, ATRA, 5M). n=2 wells per condition. v, LSS, DHCR7, 
LDLR mRNA levels measured by RT-qPCR following 24h treatment 
‘with DMSO, mevastatin (2.5 uM), Ro 48-8071 (500 nM), Ketoconazole 
(2.5uM), TASIN-1 (100 nM), or amorolfine (100 nM). =2 wells. All bar 
‘graphs indicate mean +s. and a-n and t-vare representative of ta 
‘independent experiments 
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Extended Data Fig. 6 | Inhibiting CYPS1, TM7SF2 and EBP is 
‘unifying mechanism for many small-molecule enhancers of 
‘oligodendrocyte formation identified by high-throughput screening. 
a, Percentage of MBP" oligodendrocytes (relative to DMSO control wells) 
{generated from OPCs (OPC-1 derivation) at 72h following treatment 
witha library f 3,000 bioactive small molecules, each at 3M. Each 

dot represents the result for one small molecule inthe library. Red, 
{imidazole antifungals; blue, clemastne; green, EPZ005687, the top 

novel hit molecule (Extended Data Fig 7)-b,¢, Percentage of MBP* 
oligodendrocytes generated from OPCs (lft: OPC-5; right: OPC-1) at 
72h following treatment wit ketoconazole, nine top molecules identified 
by Bioactves screening (green), and nine randomly chosen library 
members (red) at a uniform dose of 5M. 1 =4 wells per condition except 
DMSO and ketoconazole, n= 12 wells, with >1,000 cells analysed per 
well d, GC-MS-based quantification ofzymosterol, zymostenal, and 
-dehydcozymostenol levels in a second batch af OPCs treated for 24 
vvith the indicated screening hits and randomly chosen ibrary members 
st 24M. n= I; for validation ina second derivation of OPCs se Fig. 3. 
Molecules are clustered by enzyme targeted (top labels. e, Percentage of 
MIP" oligodendrocytes generated from OPCs at 72 follssing treatment 
with the indicated doses of fulvestrant, one of the top 10 HTS hits. n—4 
swells per condition except DMSO, n-—12, with >1,000 cells analysed per 
well GC-MS-ased quantification of lanosteral ievels in OPCs treated 
for 24h with fulvestrantat 24M. n=2 wells per condition. g-1, GC-MS- 
based quantification of metabolite levels in OPCs treated for 24h with the 
indicated previously reported enhancers of oligodendrocyte formation, 

at the following doses: henztropine, 2M; clemastin, lM; tamoxifen, 


100 nM; U50488, uM bexarotene, 1 j.M; Liothyronine, 3yM.n—2 wells 
per condition, jk, Percentage of MBP’ oligodendrocytes generated fram 
‘OPCs (OPC-S eft, OPC-1 right) at72 following treatment with the 
indicated previously eported enhancers of oligodendracyte formation, 

“4 wells per condition, except DMSO 2 
for OPC-1, with > 1,000 cells analysed per well. All des are in jiM, 
1, Representative images of OPCs treated for 72h wit the indicated small 
molecules, All treatments in Lare at the highest concentration shown in 
j. Seale bar, 100m. m, Structures of muscarinic receptor antagonists 
used inthis study. n, q, Percentage of MBP oligodendrocytes generated 
from OPCs (OPC-5:top, OPC-1: bottom) at 72 following treatment 
‘with ketoconazole or the indicated muscarinic receptor modulators. 
2M, the concentration used during screening. n —4 wells per condition 
‘except DMSO and ketoconazole, n= 8, with >1,000 cells analysed per 
‘ello, GC-MS-based quantification af three metabolite levels in OPC-5 
‘OPC treated for 24h with US0488 (5M) or the indicated muscarinic 
receptor modalators (24M). Left,zymostenol centre, cholesterol; right, 
‘desmosteral, n=2 wells per condition, p, Heatmap indicating inhibition 
‘of muscarinic receptor isoforms MI, NE, and MS by the indicated small 
‘molecules (2M) assayed using GeneBLAzer NEAT bla CHO-KI cell. 
n=2 wells per condition. x, GC-MS-based quantification of three 
metabolite levels in OPC-1 OPCs treated for 24h with clemastine(1).M) 
‘or the indicated muscarinic receptor modulators at 24M.  —2 wells 
per condition. Left, zymostenol centre, zymosterol right, cholesterol 
‘Sigma H127, p-luorohexahydro-sila-difenidol All bar graphs indicate 
mean +sd. andb, c,e,4,k.m, q are representative oftwo independent 
‘experiments 
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Extended Data Fig, 7 | Effect of selective oestrogen receptor modulators 
and EZH2 inhibitors on cellular EBP function and oligodendrocyte 
formation. a, Structures of selective oestrogen receptor modulators used 
inthis study. b, Effects of spemifene and toremifene on the oestrogen 
dependent growth of T47D cells, 3 wells per condition. c,d, Percentage 
‘of MRP" oligodendrocytes generated from two independent batches 
‘of OPCs a 72h following treatment with ospemifene and toremifen 

4 wells per condition except DMSO and ketoconazole, 8, with 
~ 1,000 cell analysed per well. e, Representative images of OPCs treated 
for 72h with the indicated small molecules. All molecules were used 
st 300 nM. Seale bar, 100 ym. f, g, GC-MS-based quantification of 
{ho metabolite levels in OPCs treated for 24 h with ospemifene and 
toremifene a 300 nM. Lefi,zymostenol; right, cholesterol. n=2 wells 
per condition. h, Percentage of MBP" oligodendrocytes generated from 
{ho independent batches of OPCs at 72h fllowing treatment with 
tamoxifen and 4-hydroxytamoxifen, Left, OPC-5; right, OPC-I. n= 4 
wells per condition, except DMSO, n=6 for OPC-1 (right), Effects of 
tamoxifen and 4-hydroxytamonifen on the oestragen-dependent growth 
of T47D cells. n= 3 wells per condition. j, GC-MS-based quantification 
fof zymostenol (left axis) and zymosterol levels right axis)in OPC-5 
‘and OPC-1 treated 4 h with tamoniten and 4-hydroxytamonifen at the 
indicated concentrations, n—2 wells per condition, k Percentage of MBP 
oligodendrocytes generated from OPCs at 72h following treatment with 


the indicated structurally analogous EZH2 inhibitors, 4 wells per 
condition, except DMSO, n ~ 12, wth > 1,000 cells analysed per wel. 

1, Percentage of MBP’ oligodendrocytes generated from a second batch of 
OPCs at 72h following treatment with the indicated structurally analogous 
EZH2 inhibitors. n= 4 wells per condition, except DMSO, n ~ 12, with 

> 1,000 cells analysed per well.m, Percentage of MBP” oligodendrocytes 
_generated from mouse primary OPCs at 72h following treatment with 
EP2005687.n = 4 wells per condition, except DMSO, )=12, with > 1,000 
cells analysed per well, Structure of EP2008687 and structurally 
Analogous EZH2 inhibitors, Representative images of OPCs treated for 
7a with the indicated EZ12 inhibitors, Al treatments are at 2).M. Scale 
‘ar, 100 1m. p, GC-MS-based quantification of two sterol intermediates 
following treatment of OPCs with th indicated EZH2 inhibitors at 1M 
for 24h. Left, zymostenol: right, ymosterol. n= 2 wells per condition. 

44, GC-MS-based quantification of two sterol intermediates following, 
treatment ofa second derivation of OPCs with the indicated EZH2 
Inhibitors at 1M for 24h. Left, zymostenol right, zymosterol..=2 wells 
per condition r, GC-MS-hased quantification of two sterol intermediates 
Following treatment of mouse primary OPCs with EPZO03647 at 2)M for 
24h. Left, xymostenol; ight, zymosterol. n= 2 wells per condition. All bar 
‘graphs indicate mean +sd, and c,d, h,k-o, rare representative of to 
independent experiments, 
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Extended Data Fig. | See next page for caption 
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Extended Data Fig. | Effect of combinations of small-molecule 
treatments on oligodendrocyte formation, and ability of 
oligodendrocytes to track along and wrap electrospun microfibres 
ater single small-molecule treatments. ab, Percentage of MBP" 
oligodendrocytes generated from OPCs (lef, OPC-1; right, OPC-5) at72 
b following treatment withthe indicated combinations of lithyronine 

and enhancers of oligodendrocyte formation, Unless noted, the following 
‘concentrations were used: ketoconazole, 254M; benztropine, 2).M; 
clemastine 2M; tamoxifen 200 nM: Liothyronine, 3uM.n—4 wells per 
treatment condition, with >1,000 cell analysed per well. Lio, othyronine 
«6d, Percentage of MBP’ oligodendrocytes generated from OPCs at 72 

b follsing treatment withthe indicated combinations of ketaconazole 
snd enhancers af oligodendrocyte formation. n —4 well per treatment 
‘condition, with >1,000 cells analysed per well, Representative images of 
‘OPCs treated for 72 h with the indicated small molecules. Small-molecule 
concentrations are asin a. Scale bar, 100m. f, Fold-increase in MBP* 
oligodendrocytes following plating of OPCs (OPC-5) onto microfibres and 


treatment for 14 days with the indicated pathway modulators. =2 wells 
per condition, except DMSO, n=4.g, In an independent experiment, 
OPCs (OPC-5) were plated onto microfbres, treated with small molecules 
for 4 days, and fixed and stained after L4 days. The extent to which MBP" 
sligodendmncyte tracked along the microfibre substrate was measured, 

‘n=2 wells pr condition. , Total DAPI" cell number in the experiment 
ing Representative images highlighting tracking along the microfibre 
substrate, Each image is montage of four separate images within the same 
‘well. Green, MBP Scale bar, 100 im. j, High-Fesolution images of MBP" 
‘oligodendrocytes tracking along microfibres. Green, MBP; blue, DAPI 
Ketoconazole, 2.54M. Scale bar, 501m. k, Confocal imaging of OPCs 
seeded onto aligned microfibres and treated for 14 days with ketoconazole 
(254M). The plane of the cross-section i highlighted in yellow and the 
‘ross-rectin, in which green fluorescence appeats to encircle several 
‘microfibre, is shown inthe bottom panel. Green, MBP; blue, DAPI. 
‘Alba graphs indicate mean +, and a-d are representative af twa 
independent experiments. 
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A single-cell atlas of the airway epithelium reveals 
the CFTR-rich pulmonary ionocyte 


Lindsey W, Plasschuert"*?, Rapolas Zilionis7, Rayman Choo-Wing, Virginia Savova®, ludith Knehr‘, Guglielmo Roma‘, 


Allon M. Klein’ & Aron B. Jaffe™* 


‘The functions of epithelial tissues are dictated by the types, 
abundance and distribution ofthe differentiated cells they contain, 
Attempts to restore tissue function after damage require knowledge 
ofhow physiological tasks are distributed among cell types, and how 
cell states vary between homeostasis, injury-repair and disease. In 
the conducting airway, a heterogeneous basal cel population gives 
rise to specialized luminal cells that perform mucociliary clearance', 
Here we perform single-cell profiling of human bronchial epithelial 
cells and mouse tracheal epithelial cells to obtain a comprehensive 
census of cell types in the conducting airway and their behaviour 
in homeostasis and regeneration. Our analysis reveals cell states 
that represent known and novel cell populations, delineates their 
heterogeneity and identifies distinct differentiation trajectories 
during homeostasis and tissue repair. Finally, we identified a 
novel, rare cll type that we call the ‘pulmonary ionocyte, which 
co-expresses FOXII, multiple subunits of the vacuolar-type 
H"-ATPase (V-ATPase) and CFTR, the gene that is mutated in eystic 
fibrosis. Using immunofluorescence, modulation of signalling 
pathways and electrophysiology, we show that Notch signalling is 
necessary and FOXII expression is sufficient to drive the production 
of the pulmonary ionocyte, and that the pulmonary ionocyte isa 
major source of CFTR activity inthe conducting airway epithelium. 

The conducting airway is lined with a pseudostratfied epithelium 
consisting of basal, secretory and ciliated cells, as well as rare pul: 
monary neuroendocrine cells (PNECs) and brush cells’. Studies of| 
lineage tracing and regeneration post-injury show that basal cells are 
a heterogeneous population that contains the epithelial stem cells. 
Basal cells differ in their expression of cytokeratins 14 and 8 (Krtl4 
and Krt8) and luminal cell fate determinants that are upregulated upon 
injury**. To identify the full repertoire of basal cell molecular states, 
and to identify candidate gene expression programs that might bias 
basal cells to sef-renew or to adopt differentiated fates, we performed 
single-cell RNA profiling on airway epithelial cell. We also sought to 
elucidate the molecular composition of rare PNECs and brush cells, 
hich have fewer lineage markers and are more difficult to define func 
tionally®, Because our approach is unbiased and comprehensive it 
could also identify new cell types with a role in mucociliary clearance. 

‘We performed single-cell RNA sequencing analysis! (scRNA-seq) on 
7,662 mouse tracheal epithelial cells and 2,970 primary human bron 
cial epithelial cells (HBECs) differentiated at an air-liqud interface 
(ALD) (Fig. 1a,b). As there are well-documented differences between 
‘mouse and human airways" using these two systems enables compar- 
ative analyses and prioritization of common findings between mouse 
and human. This also provided in vivo validation of findings in the 
culture model, which lacks non-epithelial cells and uses defined culture 
conditions. A similar analysis of mouse tracheal epithelial cells in the 
accompanying Article corroborates many of our findings. 

‘We visualized the single-cell data using a graph-based algorithm 
(SPRING!) that conserves neighbouring relationships of gene 


expression, facilitating analysis of differentiation trajectories, The 
resulting graphs revealed a non-uniform continuum structure span 

‘ing basal-to-luminal differentiation, with rare gene expression states 
representing satelite clusters (see ‘Data availability’ in Methods) 

Using spectral clustering, we partitioned cells into populations with 
specifi, reproducible gene expression signatures (Fig Ic, d). On the 
basis of enrichment of previously annotated markers Supplementary 
Tables 1, 2), we identified clusters in mouse (Fig. 1c) and human 
(Fig. 1d) that represented known cell types*”: basal, secretory, ciliated, 
brush and PNECs. We performed pairwise correlation analysis as a 
‘measure of relatedness between clusters, and curated alist of transcrip. 

tion factors, surface molecules and kinases expressed in each cluster 
(Extended Data Fig. 1, Supplementary Tables 1-3). Our analysis con 

firmed previous findings that basal and secretory populations 
are heterogeneous, and uncovered additional molecular heterogeneity 
(Extended Data Figs 2, 3). Basal cells formed a continuum of states 
defined by gene modules associated with a basal-to-luminal gene 
expression axis (KrfS versus Krt8) as well as by variable expression of 
genes associated with basement membrane deposition and remodel: 

ling. n both mouse and human, Col17a1/COLI7AI (gene homologues 
are writen as mouse/human throughout) and IGFBP family members 
(lgfbp311GFBP6) correlated with the basal cell sub-population marker 
Krt14, whereas in mouse an independent Krt14~ module associated 
with the basal cell adhesion molecule Ban and with Den—which 
encodes decorin, a regulator of collagen fbrillogenesis. Among secre 

tory cells, many differences were associated with different levels of 
‘maturity, with the least-mature cells expressing basal cell transcripts 
(for example, Krt5/KRTS and T:p63/TP63),and the most-mature cells 
expressing MucSb/MUCSB, Secretory cells also differed in other ways. 

In human, one cluster associated with antigen presentation (human 
leukocyte antigen (HLA) gene family members). In mouse, the secre 

tory cells appeared to associate with two distinct trajectories from the 
basal layer: those expressing Krid, and those emerging from a Krtd™ 
state marked by 71p63, Bcayn and Den. Heterogeneity of both basal and 
secretory cells was also associated with tens of other genes with diverse 
functions, including signalling molecules (for example, Wnt0a) and 
carly specifiers of mature lineages (for example, Faxjl) (Extended Data 
Figs.2,3). 

‘Our analysis also revealed gene signatures of epithelial cell states that 
have not previously been described, First the paired cytokeratins 4 and 
13 (Krtd/Krt13) defined a unique cluster in the mouse dataset located 
between basal and secretory, suggesting that this may'be a transitional 
cell state (Fig 1c, Supplementary Table I).Immunofluorescence of Krt4 
in mouse tracheal epithelium demonstrated that it was co-enriched 
in subsets of Krt5” basal cells, Krt8* luminal cells and Segblal” club 
(secretory) cells, but not in Foxjl” ciliated cells (Extended Data Fig 4a). 
‘This pattern is reminiscent ofthe proposed model for basal luminal 
precursors (BLP) subset of non-transit amplifying basal cells with 
‘upregulated luminal markers". In addition, KRT4/KRTI3 expression 
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Fig. 1 | Single-cell RNA-seq of proximal airway epithelial cellsin 
‘mouse and human. a, Mouse tricheal epithelial cells were isolated, 
dissociated and collected for scRNA-seq. Human bronchial epithelia 

cells (HBECs) were cultured for 1 week submerged, followed by 2 weeks 
at an air-liqud interface (ALI) and collected for seRNA-seq.b, Mouse 
twacheal epithelium (n= 3 mice) and differentiated HBEC culture (n=3 
donors) are pseudostratiied, containing basal cells KRY), secretory 
cells (Seybal in mouse; MUCSB in human), and elated cells(AcTub, 
acetylated o-tubulin). Scale bars, 20m. c,d, SPRING plots of seRNA-seq 
ata for mouse tracheal epithelial cells (n —4 mice, 7,652 cells) (e) and 
HBECs (n=3 donors, 2970 ces) () coloured by inferred cell type, with 
hheat maps of lineage-specific genes by biological replicates (rows). Cell 
‘numbers are post-quality control, PNEC, pulmonary neuroendocrine cells. 
Lineage markers for PNECs and brush cells were expressed in rare cells in 
HEC cultures, and formed just one buman cluster, 


was closely correlated and defined a major axis of heterogeneity in basal 
and differentiating HBECs (Extended Data Fig. 3b). 

Second, inthe human single-cell map, we identified a FOXN¢* clus. 
ter that was highly enriched for the ciliated cell specification factor 
FOX]1 but low for markers of maturation, including the ciliary com- 
ponent TUBB4B (Fig. 1d, Supplementary Table 2) Foxnd is known to 
rive robust transcription of ciliated genes during multiciliated cell 
differentiation in Xenopus", suggesting that this cluster represents a 
state of multcliated cell differentiation. We confirmed the existence of 
this cluster by immunofluorescence, showing that FOXN4 was indeed 
enriched in a subset of FOX}1"® cells but notin cells containing mature 
cilia (Extended Data Fig 4b). Thied, inthe human data we identified a 
novel luster that was enriched for SLC16A7 (Fig. 1d, Supplementary 
‘Table 2), which encodes the monocarboxylate transporter 2 (MCT2) 
that is involved in acidification of cystic fibrosis HBEC cultures”, as 
well as AIRE, the gene that drives negative selection of self-reactive 
‘Teells in thymic epithelium’. This cluster contained the largest number 
of highly specific genes in the dataset, with a greater percentage of 
‘mitochondrial genes, This cluster may reflect cellular stress or may 
representa unique antigen-presenting airway epithelial cell. 

Finally. a single cluster identified in both mouse and human was 
enriched for ion transporters and the transcription factors Foxil, 
Asel3 and Tjep2lt (Fig. 1c, d, Supplementary Tables 1,2). This clus: 
ter expressed subunits of the V-ATPase proton pump, which are also 
expressed in Foxil-expressing ionocytesin the mucociliary epithelium 
of Xenopus larval skin, intercalated cells of the mammalian kidney 
and in forkhead-related (FORE) cells of the inner ear™. This cluster 
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Fig. 2 Single-cell RN 


reveals recovery-specific cell states. a, Mice 
‘were administered 2% polidocanol by oral-pharyngeal aspiration, and 
tracheae were collected 1 (n=1),2(=1),3 (n=I)and 7 (n=3) dpi 

for scRNA-se9, Immunofluorescence for basal cells (KetS) and lumenal 
markers (AcTub and Scgbla1) shows that lumenal lineages are shed at 
{dpi (n =3), basal population expands at dpi (n=4), mature lumenal 
‘markers ate visible at 3 dpi (n—3) and the differentiated epithelium is 
restored by 7 dpi (compare to Fig. 1b; x=3)-b, SPRING plot of scRNA-seq 
data showing cells from uninjured (n =7,898) and regenerating (n= 6265) 
mice. Cel states that emerge during regeneration are shown in grey (see 
Extended Data Fig. 5a, Methods), Top, enrichment of scRNA-seq cell 
states compared to uninjured, Boitom, relative abundance of cell ypes 
teach time point, Rare class includes ionocytes, brush and PNECs 

4, Expression patterns of keratin genes in baal cells change between 
‘uninjured trachea and I dpi. The heat maps show imputed expression 
‘counts, with range from the Sth to the 95th percentile. Basal and Ket 
KKrt13" cluster cells shown, 


‘was highly enriched for Cftr, the gene that encodes a critical chloride 
channel that is mutated in cystic fibrosis, as well as for genes encod: 
ing multiple CLC chloride channels (for example, NKCC1, CIC-Kb), 
the calcium-activated potassium channel KCNMAL and members of 
the Sle9 family of Na*/H* exchangers (Nhe4 in mouse and NHE7 in 
human). We named these cells pulmonary ionocytes. 

To further identify cell states in the conducting airway that may 
emerge or expand following injury, we performed scRNA-seq and 
immunofluorescence on regenerating tracheas at 1, 2, 3 and 7 days 
after polidocanol-induced injury (Fig. 2a). We visualized transcrip 
tomes ofthese cells using a SPRING graph, expanded to 14,163 cells to 
reveal detailed changes in epithelial cell states during repair (Fig. 2b. 
‘This identified two states specific to injury response (Extended Data 

Fig. 5a, Methods). The frst state appeared at 1 day(s) post-injury (dpi) 
(Eig. 2b (light grey), ), corresponding to Krt3* basal cells in cycle and 
o-expressing additional cytokeratins including Kvt14, Krt@ and Krt4/ 
Krt13, which were largely non-overlapping in homeostasis (Fig, 2d. 
‘The second injury-specific state, which appeared at 2 and 3 dpi, 
included cells transiting directly from basal to ciliated (Fig. 2b (dark 
‘grey),c) rather than differentiating through a secretory progenitor 
(Fig. 1c). We detected 1,237 genes that varied in expression during 
‘multiciliated cell differentiation, including the specification factors 
Foxjl, Myb and Meidas (Extended Data Fig. 6, Supplementary 
‘Table 4). Early secretory cell states also reappeared at 2 and 3 dp. 
By 7 dpi, the relative abundance of cell populations, including rare 
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Fig. 3 | FOXI specifies a novel cell type, the CFTR-rich ‘pulmonary 
Tonocyte: a, lmmunofluorescence of FOXI! (red, arrow), and airway 
lineage markers (green, arrowheads); TP63 (basal), FOX} (ciliated), 
MUCSB (secretory) and ASCLI (PNEC) in differentiated HBEC cultures 
(=3 donors) b, Fluorescent in situ hybridization in mouse tracheal 
epithelium (n—3 mice) and human bronchial epithelium (n ~2 donors) 
for FOXII (red) and CFTR (green).c, HBECs were transduced at seeding 
with GFP or GFP:FOXI lentivirus, differentiated and then profiled by 
scRNA-xeq or analysed by immunofluorescence, d, Immunofluorescence 
for ATP6V1BI (white) and FOXII (red) in HBECs transduced with GEP 
or GFP-FOXI (x =4 experiments from two doners). Scale bars, 20 12 
«fold change in fractions of cell states revealed by scRNA-seq in 


populations (PNECS, brush cells and pulmonary ionocytes) largely 
returned to that seen in uninjured tracheae (Fig. 2c, Extended Data 
Fig. 5b). 

‘Our data open up a range of possible avenues for future research, 
from the importance of the gene modules defining basal and secre 
tory cell heterogeneity to the catalogue of potential regulators and 
components for rare PNECs and brush cells, premature ciliated cells, 
Krt4/Krt13” cells and pulmonary ionocytes. In this study, we focus 
on the localization, specification and function ofthe newly identified 
pulmonary ionocyte. 

‘We first validated the presence of the pulmonary ionocyte popu- 
lation by immunofluorescence. FOXII labelled 1-2% of HBECs and, 
as predicted, was distinct from basal cells (labelled by TP63), secre 
tory cells (MUCSB), ciliated cells (FOXJ1) or neuroendocrine cells 
(ASCLI) (Fig. 3a). Immunostaining demonstrated apical enrichment 
of the V-ATPase in FOXIL” HBECs, similar to what has been shown 
for other Foxil epithelial lineages™="", a well as nerve growth factor 
receptor (NGER, Extended Data Fig. 4), confirming the marker gene 
enrichment identified by scRNA-seq (Extended Data Fig. 1b). Also 
as predicted, CFTR mRNA was highly enriched in FOXH-expressing 
cells in mouse trachea and primary human bronchial tissue (Fig. 3b), 
compared to the low expression throughout the epithelium (Extended 
Data Fig. 7d, e). FOXII™ cells were more concentrated in bronchial 
sland ducts than in the surface epithelium (Extended Data Fig. 74), a 
pattern similar to previously described rare, CFTR™* cells 

In other proton-secreting cells, Foxl specifies the lineage and reg, 
lates expression of V-ATPase subunits”; therefore, we next asked 
whether FOXI1 was sufficient to specify pulmonary ionocytes in 
HBEC cultures. We performed scRNA-seq of cells transduced with 
lentivirus expressing GEP:FOXII (n= 10,330) oF GEP alone (n=9.436) 
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GEP:FOXI1 versus GFP. Heat map values correspond tothe ratio of cll 
numbers from the viral transduction experiments projecting onta each 
Point of the reference HBEC dataset from Fig. Id. Extended Data Fig. Se 
‘extends this to populations specifi to vital transduction. g,lonocytes 
induced by GFP:FOXI are transcriptionally similar to natural ionocytes, 
shown by comparing their gene expression in seRNA-seq data from three 
experimental conditions (reference data from Fig. 1d) The genes shown, 
are markers of each epithelial cell type (bottom), with ionocyte markers 
shown in detail (top). Genes are normalized tothe median expression level 
across popultions observed ina given condition. ADGRES was previously 
known as GPRII6 


and mapped the data onto the reference HBEC state map (Fig. 3c, 
Extended Data Fig. 8a-e). Cultures transduced with GEP-FOXII had 
significantly higher numbers of cells classified as ionocytes (23-fold 
increase, P< 10 "by Fisher's exact test, Fig 3e f), with slight reduc 
tions in basal and ciliated cells. The resulting ionocytes expressed high 
levels of exogenous GEP:-FOXII (Fig. 3g, Extended Data Fig 8), and 
exhibited the same transcriptional program as unperturbed ionocytes. 
Moreover, they did not express markers of other cell types (Fig. 38). 
Immunostaining (Fig. 3d), quantitative PCR with reverse transcrip. 
tion (RT-qPCR) profiling of marker genes, and RNA in situ hybrid 
ization performed on transduced cultures (Extended Data Fig. 74-c) 
confirmed these results, indicating that FOXI is sufficient to specify 
CETR-rich pulmonary ionocytes. FOXH overexpression also led to the 
appearance of a novel non-ionocyte cll state, possibly resulting fom 
off-target FOXI transcriptional activity (Extended Data Fig. 8c 
Supplementary Table 3) 

In Xenopus epidermis, ionocytes differentiate from an inner layer of 
basal cells, and thei specification is regulated by Notch signalling", 
a pathway thats important in airway basal cell fate?*. The specifica 
tion of pulmonary ionocytes shows clear similarities. Fox cell first 
reappeared in the basal cell pool following injury depletion (Extended 
Data Fig. 5c), and Foxil co-localization with the basal cell marker 
Krt5 transiently increased after injury (46.3% of Foxil” cells at 3 dpi 
compared to 15.4% at steady state; Fig, 4a). This suggests that Foxil 
cells havea direct basal cell origin, consistent with steady-state lineage 
tracing studies, Notch target genes are also expressed in pulmonary 
ionocytes (Extended Data Fig. 9). Treating HBEC cultures with the 
-ysecretase inhibitor DAPT decreased Notch target gene expression 
and inereased ciliated cell specification, consistent with previous studies 
in which Notch signalling was modulated in mouse and human airway 
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Fig. 4 | Pulmonary ionocytes are a major source of CETR activity, 
a, Immunofluarescence for Ket5 (green) and Foxil (red) in mouse racheae 
st homeostasis (Let, n= 3) or 3 dpi (right, n=3). Arrowhead, FoxilKetS 
cells arrov, Foxit KrtS cell Seale bars, 20 um. b, Immunofluorescence 
‘and quantification for ionocytes (FOXIL red, arrowheads) and 

ciliated cells FOX}1, green) in HBEC cultures treated with DMSO oF 
DAPI Scale bar, 100 im; =4 experiments from one donor. *P= 0.01, 
S*5P—1.1 x 10-*by to-talled F-test. c, HBECs were treated with DMSO 
orDAPT upon ALI culture. After differentiation (2-8 weeks) cultures 
‘were loaded into Ussing chambers and short-cicuit current (1) was 
recorded during addition of amiloride, forskolin, and a CFTR inhibitor, 
(CETR(ink)-172. A representative tracing from donor | (n=11)is shown. 
Change in short-circuit current (A) in response to forskolin measured 
in DMSO (=7 cultures per danor) or DAPT-treated cultures ( 

cultures per donor). **=P <1 10 * by two-tailed test, e, Donor mean 
‘Aljcin response to forskolin plotted against mean numberof ionocytes 
(FOX) or ciliated cells (FOXI1)(n-=7). All data are mean sem, 
A, Pearson correlation with associated P value. NS, not significant. 


cultures (Extended Data Fig. 0a), DAPT treatment als significantly 
decreased the number of ionocytes (Fig. 4b). Treating HBEC cultures 
with antibodies against individual Notch receptors also reduced iono: 
cyte numbers (Extended Data Fig. 10). 

‘We next investigated the functional importance ofthe high level 
of CFTR expression in pulmonary ionocytes.Cilated cells have been 
hypothesized to be the major source of CFTR in the proximal ar 
‘way, but we found lite to no CFTR expression in FOXJ1" ciliated 
cells (Supplementary Table 2, Extended Data Fig. 7). To examine 
CFTR activity in the proximal airway epithelium, we recorded CFTR- 
mediated ion transport in HBEC cultures using Ussing chambers 
(Fig 4c). DAPT-treated cultures, which reduce the number afonocytes 
and increase the number of ciliated cells, had significantly lower CFTR 
activity in response to forskolin (mieasured as short-circuit current, 
11 Fig. 4d). We also used natural variation between donors to assess 
the sensitivity of CFTR activity to changes in ionocyte numbers ver- 
sus changes in ciliate cell numbers. Ussing experiments and cell-type 
quantification in cultures derived from seven different donors showed 
that CFTR activity was positively correlated with ionocyte number 
(Pearson's R=0.3, P=0002) and not correlated with ciliated cell numm- 
ber (R=0.44, P=0.32), with ionocytes explaining 60% of the mean 
channel current compared to just 4% for ciliated cells (ater multivariate 


regression) (Fig. de). These data suggest that ionocytes are a major 
source of CFTR activity in airway epithelium despite representing only 
1-2%6 of epithelial cells 

In this study, we applied large-scale single-cell profiling to conduct 
an unbiased investigation of the composition ofthe proximal airway 
epithelium during homeostasis and regeneration. In doing so, we unex- 
pectedly identified the pulmonary ionocyte, a rare cll type that appears 
tobea major, possibly dominant, source of CFTR activity in airway 
epithelium, This cell ype shows co-enrichment ofthe proton-secreting 
‘ATPase and the anion-secreting CFTR channel, suggesting a role in 
luminal pH regulation that coud be relevant for the pathology of esc 
fibrosis! The role ofthese CFTR-rich cells in airway physiology and 
disease remains to be elucidated, but itis likely that their identifica- 
tion will better inform futuee therapeutics fr cystic fibrosis. Finally, 
‘our study provides a comprehensive atlas of genes and pathways with 
potential roles in promoting differentiation and repair, delineates the 
cell types, transcriptional profiles and trajectories present in the prox: 
imalairway in both homeostatic and regenerating tissues, and ofers a 
baseline fo futur profiling of disease states 
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METHODS 
REC culture and Notch inhibition, Primary human bronchial epithelial cells 
(HIBECS) from normal donors aged 3-42 vere obtained from Lonza (CC-2540; 
{Lot 21175, 325353, 429581, 105104, 621713, 312626 and 441099) and were 
expanded twie with growth medium (500 ml BEGM medium (Lonza, CC-3171), 
1 SingleQuots kt (Lonza, CC-4175)) inT75 flask. After expansion, HBECs were 
seeded on 12-well Tanswel plates (Coming, 460) ata density of 8,000 cells 
per Transwell The cells were cultured in difeentation medium (250 ml BEGM 
‘medium, 250 ml DMEM medium (Thermo Fisher, 11965092), 1 SingleQuots kit) 
‘on bath apical and basl sides of Transwell fo the first 7 days, Then, medium 
‘was removed from pica side, and cll were cultured for another two weeks at 
sn ALL condition Cells were used for analysis after culture at AL for Lt daysand 
‘an more than 28 days, 

Notch signalling was inhibited by adding 3.3 jM DAPT or DMSO to dferent- 
ation medium when HECs were cultured at ALL Notch antibodies against recep- 
tors NOTCHI, NOTCH2 and NOTCHS and contol IgG antibody were described 
previously" and were added at a concentration of 10 pg/ml to HBECS upon 
ulate at ALL 
Single-cell dissociation. HBECs wee collected afer 1S days of ALI culture wsing 
(2.05% trypsin-EDTA (Thermo Fisher, 25300054). Cells were then pelleted at 300g 
for mia, resuspended in PBS and fered through 3 20-um strainer (PlurSdect, 
43-50020-03) Ces were counted on ahaemocytometer and Optipep (Sigma 
Aldrich, D1556) was added to achieve a final concentration of 15% and 75,000 
calli 

(C37/BL6 male and female mice from the Jckson Laboratory aged 6-8 weeks 
were used for all studies, Animals were handled in accordance with Novartis 
Institutes for Biomedical Research Animal Care and Use Committee protocols 
and regulations. Mice were housed in a temperature-and humidity-contolled 
animal facity with ad ibitum accessto food and water and acclimated for at eat 
3 daysbefore experimental manipulation. For single-cell isolation forscRNA-seq, 
‘uacheas were dissected and opened longitudinally in HamisF12 (Life Technologies, 
11765-054) plus 1 penicilin-streptomycin on ce. Each trachea wasindividally 
placed ina 15-ml conical tube with 5 ml 5 mg/ml Pronase (Roche, 10165921001) 
{in Hans F12 plas 1% peniclin-streponsycin and incubated for 18h at °C. Five 
hhundted microltes FS was added to inactivate pronase and conical tubes were 
‘vigorously inverted to dislodge cel. Each trachea was transfered tice to 15-ml 
‘conical tube containing Hams F12 plus 1% penclin-streponsycin plas 10% FBS 
fndithen inverted. Medium from each of the three tubes was pooled and cells wete 
pelleted at fr 10 min st 4°C, Cel were esuspended in 500 DNase solution 
(0.5 mg/ml DNase (Sigma-Aldrich, DN25), 10 mg/ml BSA in HamisF12 + 1% 
peniclin-streptomyein). incubated on ice for 5 min and then plleted a 00g for 
Wain. 4°C. Calle were then washed ice in Hanis FL2 1% pentlin-steptomycin 
10% FRS and then resuspended in PRS +-0.02% BSA. Cll were diluted 4 90,000 
calli in 15% Optipeep + 0.02% BSA in PBS for sc RNA seq 
Single-cell transcriptome barcoding in drops and library preparation for 
lumina sequencing. For scRNaseq. we used inDrops"fllowing the protocol 
previously described withthe modifications summarized in Supplementary 
Table5. In brie, disocsted singe cll were co-encapsulted ino 3-4 droplets 
together with hydrogel beads carying barcoding reverse transcription primers. 
Following a reverse transcription in droplets, the emulsion was broken and the 
blk material was taken through the fllowing steps: () second strand synthesis: 
(@) linear amplification by in vir transcription (IVT); (id) amplified RNA frag- 
‘mentation; (i) verse transcription (¥) PCR. A subset ofthe GFP-FOXM libraries 
‘were processed using small vations on the published protacl, including a diter 
‘ent reverse transcriptase and exclusion of Hinfl digestion, The resulting libraries 
‘were sequenced either on a HiSeq or Nextseq lumina platform in paired-end 
rode toa length of 2x 100 or 2 76base pars (se Supplementary Table 5). 
[mages from the instrument were processed using the manufacturers software to 
generate FASTQ sequence files. Read quality was assessed by running FASTQC 
(wou, 

Obtaining transgene counts in cells transduced with lentivirus, The single ell 
.RNA-seq method we usedllows the detection of transcript sequences upt0~1 kb 
‘upstream ofthe polyA tal. Hoth GFPand GFP-FOXII transcripts share the sme 
1L3-Kb-long sequence upstream of the transcription termination snd polyadenyl- 
sion site within the lentiviral "long terminal peat (LTR) This 1 3-kb sequence, 
hich is pat of the plenti6/V5-DESENGFP Gateway scaffold, was added tothe 
ference transcriptome to identify transgene counts, Notably transgene’ refers 
to ether GFP:FOX or GFP, depending onthe dataset. In Fig, 3, the transgene 
was added manually to the heat map (top),as was FOX/1 (bottom heat map). a 
‘canonical marker of muliciiated cell that ful to appear asa unique marker gene 
because ofits expresion inthe FOXNS* cluster. 
Single-cell RNA-seq data analysis. Procecng of oquoncing rds. To generate per- 
call gene expression count rom raw sequencing reads, we used an updated and 
publi availabe version (htps/github.convindrops) f the custom Sequencing 


data-processing pipeline described, Parameters used with the indrop py pipeline 
fate specified nya files provided as Supplementary Fes I an 2. In rit, raw 
‘reads (FAST) were filtered fr sequencing quality and expected structure, sorted 
based on harcodes sequences (reads derived trom the transcriptome ofthe sme 
call carry the same barcode) and aligned to ether mm 10 or gl cDNA reference 
‘vith separately added mitochondrial cDNA sequences, To quantity gene expes- 
sion while correcting fr amplification biases, we made use of unique molecula 

‘deniers (UMIs)inwoduced during reverse anscription in drops. The output 
‘of low-level processing isa genes» celle expression matrix, 

Single-cell data cleanup and normalization. To ensure high-quality data for fr 
‘ther analysis, ye itered out cells with few counts and s high mitochondrial gene 
fraction, Thresholds were selected by visually inspecting histograms af counts 
percelland mitochondrial fraction per cell fr each biological sample separately. 
‘With human samples « mitochondrial fraction threshold of 25% and total cunt 
thresholds of 1500, 1.500 and 2,00 were used for donors 1, 2 and 3, respectively 
"The same thresholds a for donor 1 were used in the GFPIGEP-FOXII overex- 
pression experiment For mouse, a mitochondrial fraction threshold of 20% and 
{otal count thresholds of 1.500 were used forall datasets, Initial visualization and 
clustering (ee helo) revealed that small fraction of mouse cells (<4) formed 
‘well-separated custers characterized by a strong immune gene signature, These 
cells were excluded from further analysis, For uninjured mouse data, we applied 
fn addtional clean up sep to remove cll doublets, which can occur rarely ving 
te incomplete cell dissociation or owing to two cells occasionally entering the 
same microfluidic barcoding droplet. In bret, a decay training set of simulated 
<Soublt is generated by randomly combining single-cell trnscriptomes from the 
<ataset This decoy training sets used to train ak-nearest neighbour classifies, Call 
transcriptomesclasied asin ica doublet were excluded trom further analysis. 
‘Thedetaed method willbe pubhed eewhere, Data was then normalized by the 
total counts per cell as described, with the flloing modification: to calculate the 
‘normalization factor (total counts per cll) we excided any gene with expression 
level >5% of total counts in ateast ne cll. 

‘Data vsuaization using SPRING and clustering. To visualize the high-dimensional 
gene expression data, we applied SPRING, a method for building a k-nearest 
‘neighbours (ENN) graph of clls and representing it in 2D using a frce-directed 
layout. Clusters were identified by applying spectral clustering onthe same aja- 
cency matrix as use for SPRING (implementation in python, sklearn cluster 
SpectralClustering(tfinity = 'precomputedassign._labels=Uiscretize')- Clusters 
‘were assigned label (for example secretory, basa) based on marker gene expres 

sioa. Inthe SPRING plot of human data (Fg), clusters representing intermediate 
"ates with no unique gene expression ace shown in gre. 

(Call population-specfc gone identification (Fig, 1). Tobe considered as specific 
1 population i, gene had to satisty the following criteria: (a) be expressed at 
‘significantly higher level in population i compared tall other als as determined 
by a two-sided permutation test using the difference in sample means a the test 
istic (flee discovery rte (FDR) <5). Tobe considered for statistical testing, 
‘gene had to be detected ina least 1% of cells ether side ofthe comparison, 
() Average expression >50 transcripts per million (TPM) in population i 
{) Average expression in population ‘at least 1 5-fold higher than in any other 
population (thats, maximum-to-second-maximum ratio 15). A pseudo value 
(of 10 TPM was added before division. (d) Be maximum in popalation (for 4/4 
(mouse) or2/3 (human) ofthe biological replicates. 

Figure le, d shows the expression level ofthe top 50 such hits ordered by 
<ecreasing maximum-to-second-maximum ratio, Far each gene, 100% was set 
at the maximum expression per cluster (average of al replicates). The colour 
‘was saturated at 20% (low) and 100% (high) Detailed gene lists are provided as 
Supplementary Tables 1-3, 

"Fr Extended Data Fig. 1, transcription factor lists were obtained from animal 
DB", and GO terms GO.0016301 and GO:0009%, including ay descendent 
terms, were used for kinases and surtace molecules respectively 
“denaifcation of correlated gene modules within basal and seeretory cl To charac- 
teriaethe heterogeneity within basal and secretory cel, we identified modules of 
correlated genes. For mouse, we performed the following steps sce also Extended 
Data Fig. 2, 3)(a) select basal cll (same procedure fr secretory; (b) identify 
3,000-5,000 most variable genes; (c calculate gene-gene rank correlation; 
(retain gnesthathaver> 0.2 with atleast 4 other genes—in mouse, Kt did 
‘not met this criterion, and was therefore was included manually 

"Heat map rows and columns (Extended Data Figs. 2, 3) were hierarchically 
clustered (distance defined as 1—yuqus Wat linkage). For human data, we st 
considered the basal secretory and intermediate state cells collectvely to identify 
‘two main modules of anti-correlated genes (Extended Data Fig. 3). roma there, we 
selected genes speci to baal and recalculated gene-gene correlation but within 
‘basal cells only. The same was performed with secretory cls, 

Smoothing (data imputarion). Smoothing was carried out in ig. 2c, d, and 
Extended Data Figs a, 8c, 9. All data shown in other figures is not subject to 


smoothig/impatation. Daa smoothing, regula imputation, was arid 
ut using raph fusion approach on the nearest neighbour graph Gdefind 
aboveby SPRING. Gisan unweighted undvecte raph. The smoothing operation 
feplacesa scalar quant on Bode ofthe raph, for example, raw expression 
lee of gene, witha smouthed value” = Ox in which the smoothing operator 
is0,— cand Lis the random walk graph Laplacian of G. The smoothing oper 
ator accepts single parameter 3, which determines the kernel se, that the 
xtc of smcothng. This parame isequalent in physical ernst difision 
tine longer tines lead to roaderdifsin, Fr al plas sow, we wed 
Analysis ol densty changes elatvet uninjured. To visualize wc el popu 
ions are enriched a agen ime pont lative to uniajured (Fig 2) the allowing 
vas performed for everytime pat of mouse recovery data) gt every cel 
From to vote ris 10 nearest neighbours amangall mouse cll and count votes, 
{b) Smooth vote counts om the graph (ce previous section for smoothing) 
Smooth vote counts ae pox fo the densa cl fom tne point ton the 
s7aph ec also the wo ef-most plo of Extended Data Fig 5a) () Normalize 
thetotal vote count 1 (f) Divide the density a ime pont by the density af 
celsinunijured 
enifaton of recover pet cll populations. The procedures sumriaed 
in Extended Data Fig 5a To slenuy recoveryspec ell populations nthe 
SPRING pot combining all mouse data (populations in te in Fg 28), we ist 
performed steps a) and (b) described inthe previous paragraph to determine the 
{nt of injured callonthegraph Next thesold of 25 smoothed counts was 
seected by visu inspection of the dstbution of votes and alls eetving fewer 
than 5 ots were coosiderd depleted in uninjured (hats ecovery specie) 
Recoveryspeciic cls wer spltinto two cise by petal cistern, and labels 
were asigne based on characteristic gene expression. Cal fom the mouse a 
try tune course experiment that were aot evover specific inherited the bel 
ther singe nearest neighbour in unijured mouse data (Euclidean distance in 
principal component space of mos variable genes) 
‘nals of covery-spetic nator rant asl cated. The procedures abo 
sunmaraed in Extended Daa Fg 6 Stxhundred and nine recovery specific cells 
from 24-72h postinjury and forming a continuum between basa andlistd 
cel, were manually selected on the SPRING plot and sede population Balance 
analysis (PBA). a method deeaped in ourlaboratry or describing diferent 
tejectories” Forthisanalyi,Soues’ and sn el populations were defied as 
thebasaland mulated tips ofthe cell ANN graph expec Cals werethen 
dered onthe graph bythe dfusion potent parameter of PBA (a measure 
peut of progression fom source wo sink)-T smooth the gene expression 
Ofindividal cells, moving average with window sizeof 100 cell was aka 
Hensfeationof diferent expres gees slong the aslo-atedtjecay. 
“Temporal varying genes were ented using previous method with minor 
changes. Before statisti testing, the following ters were applied onthe all 
gene list considering only the 609 el forming the basal-o-llted rajectory 
{G) expression level atleast 3 normalized counsin at east els () variable 
Fano factor>1 

‘Notably. none ofthese ltrs considers the cll ordering, Fr each gene ‘of 
the surviving 4.561 genes, asta was Acute eyes 
whichis vector wih he expesion level of gene inthe 510 averagecalsaer 
application ofa moving average over cell ordered sing PBA. The procedure wis 
repeated on shed cells for multiple permutations each tine resulting in a 
‘no ale The one-sided Pato gene! was defined ashe action of ines 
‘one #nuyTo acount or mail hypothesis sting thefuls discovery 
Tate was conled at 5s using the Benjamin Hochberg procedure For each af 
the561 genes used in the permutation es weal calculated the maximum fold 
change defined a FC,,,— "i=" “Ope thousand two hundred and 
thirty-seven genes with FGu,2°3 and FOR< 5% were considered differentially 
expressed along the basalt trjctory. 
Polidocanl-induced injury Poidocana-ndaced jury was performed as pre 
ously described tne. mice were anaesteizedand deivere oe deseo 3.5 
1% polidocanl or PBS vehicle contl by orlpharyngelapration to induce 
inary. Thacheae were collectedat 2.3 an 7 days folowing iury fa scRNASe 
or for zation and immunofuerescence 
Tmmunofuorscens, microscopy apdcell counting For RNAScope an imiuno 
Ahorscence of purafi-embedded sections, mows tacheseweredsected under 
stele conditions and HBEC Tanawellcukures were sated using biopsy 
Punch (Integra Mite, 33-37). Primary human bronchial sue was obtained 
throgh the teatonal nue forthe Advancement of Medicine Asus were 
immed fed in 10% normal buered formalin 18-D4hat room temperatare 
then anseredo PBS and kept 4°C uni paafn embedding. 

For inmunofluorescenceof mouse trachea, S-an sections were baked 
and deparafnized using standard procedures. Alter antigen etieval using pH 
irate butler (Abcam), sections wee rinsed in PBS and blocked in 10% normal 
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‘goat serum (NCIS) or 10% normal donkey serum (NDS) for 30 min at room tem: 
perature. Primary antibody was added overnight at 4°C. Sections were washed 
3 in PBS for 5 min each, and secondary antibody was added for 1h a room 
temperature and sections were gain rinsed in PBS, fllowed by Hoechst (11,000) 
for 30s, For RNAscope, Sum sections were prepared according to RNAscope 
procedures for multiples forescent assay (Advanced Cell Diagnostics, 320850) 
‘ar dual chromogenic asay (322430), RNAscope probes used were FONI! (476351, 
/483021), CFTR (603291, 499011), and FOX/1 (430921). Mounting medium and 
coverlip were applied and slides were stored a 4° for immunofluorescence oF 
‘oom temperature for chromogenic insta hybridization. For immaunofluores- 
cence of whole-mount HBEC Transwell cultures, cells were fixed in 4% paraform- 
Aldebyde foe 30 min RT, washed 3% 10 main in IF buffer (130 mM NaCl, 7 mM 
NaHPO, 3.5 mM NaHPO,,77 mM NaN, 0.1% bovine serum albumin, 02% 
“Triton X-100, and 0.05% Tween- 20), blocked in 10% NGS IF buffer, stained in 
primary antibody ited in 10% NGS IF butfer overnight t4°C, washed 3> 20 min 
In IF buffer, counterstained in secondary antibody diluted in 10% NGS IF buffer 
plus 15,000 Hoechst for 1h at room temperature, washed 3 20/min in immu- 
hofluorescence bufer and washed 2 in PBS before mounting. The fellawing 
antibodies were used: rabbit anti-FOXI (1:200, Sigma-Aldrich HPAO71469), 
‘mouse anti-FOXII (1:100, Origene TA800146), goat anti-FOXII (1-200 Abcam 
420454) rabbitanti-ATP6V BI (1-100, Sigma-Aldrich HPADSI847), mouse ant- 
‘cetylated tubulin (1,000 Sigina-Aldrich T6793), eabbit anti-Scgbal (1200, 
Millipore 07-623), rabbit anti-MUCSB (Santa Cruzsc-20119), rabbit ani-FOXTL 
(1:200, Sigma-Aldrich HPA005714), mouse ati-FOXTL (EBioscience, 1.200, 
11996580), rabbit anti-FOXNA (Sigma-Aldrich HPAUSOO1S). mouseant-ASCLI 
(4:00, Beckton-Dickinson 536604), mouse anti-Krt4(1:100, abcam ab9004), 
rabbit anti-Krté (1100 Proteintech 16572-1-AP), rabbit anti-Krts (1250, abeam 
52635), chicken ant-Krt (1,000, BioLegend 9059), mouse anti-NGFR (1200, 
“ThermoFisher, MAI-18418) and chicken anti-Krt8 (1:200, abeam ab107113) 
Secondary antibodies used were Alexa Fluor 488, 568,617 (Life Technologies) 
a 500, 

Fluorescent images were collected on a confocal microscope (Axiovert 200; 
Carl Zeiss), with 40> objective (eis, Pan-Apochromat 40>/13 Ph3 M27). 4 
Yokogawa CSU-X spinning dsc ead and an electron-multplying charge-coupled 
device camera (Evolve $12; Photometrics) Scale bars were added, and images 
were processed using Zen Blue software (Zeiss) and Photoshop (Adobe). FOXIL 
and FOX! ells were counted using Image] software. Chromogenic signals were 
Acquired using a Noance FX maltispectal imaging system (Perkin Elmer) with 
an Olympus BNL microscope interface with aliquid crystal based camera and 
tunable iter from 420 nto 720 nm at 20-nm interval. Spectral components 
were unmixed and pseudo-coloured for individual channels 
Lentivirus production. For overexpression, FOXI] (GeneID 2299) was cloned 
into the pleni6/V5-DESTNGFP Gateway vectar, which was generated by tans- 
ferring the N-EMGEP ORF from peDNA62/N-EmGFP-DEST (Thermo Fisher, 
(Cats V35620) into pLentiV5-DEST (Thermo Fisher, V49610) Lentviralpack- 
aging 4x 10° 2937 cells were seeded in a 100-mm poly-p-ysine-coated dish 
(Corning BioCoat, 356169) one day before transfection with 14 ml of el growth 
‘medium (DMEM (Thermo Fisher, Cat? 11965092), 10% FRS (Clontech 631106), 
21M 1-glutamine (Invitrogen 25030) 0.lmM MEM Non Essential Amino Acids 
(Gnvitrogen 11140), and LmM sodium pyruvate MEM (Invitrogen 11360). For 
transfection, 7ygof packaging plasmid DNA (ViraPower lentiviral Packaging 
“Mix, Thermo Fisher K497500) was mixed with Sy of expression construct DNA 
and 36 Fugenes (Promega. £2691). OptiMEM (Thermo Fisher, 31985062) was 
then added the mixture to total volume of 80 2937 cells were incubsted 
with the transfection reagent mixture for 24h before the growth medium was 
refreshed, At 72h after transfection, virus was collected, and frozen for future 
experiments, Packaged virus was added to HBEC cultres 1h after cll seeding 
fn then removed at feeding the follwing day. 
love cytometry and cell sorting. Cells were collected using 0.05% Typsin- 
EDTA (Thermo Fisher, 25300054), pelleted at 300gfor 5 min, suspended in 256 
FAS DMEM with EDTA and filtered through «40-n strainer before being ans: 
Iysedby os cytometry or cellsrtng. RNA was extracted with Trizol (lviteogen, 
15596026) or RNeasy Mini Kit (Qiagen, 74106). cDNA was synthesized from 1p 
of RNA with qScript XLT cDNA Super Mix kit (Quanta Biosciences, 95161- 
100). Quantitative PCR (qPCR) was carried out using FastStart Universal Probe 
“Master kit (Roche, 04914058001) with 40 ng ofDNA per reaction. Tagman probe 
sequences used for gPCR (Applied Biosystems) were: FOX, 1s00201827_anl; 
FFOXI, Hs00230964 ml; Pes, HS00978340 ml; GAPDH, Hs99999005_nI;CFTR. 
'Hs00357011_ml:ATP6V1B1, Hs00266002_ml;17GA6,Hs01041011_ml: DNA, 
1Hs01001544_mul; SCGBIAL, 1500171092; MUCSB, H500861588_ml; NRARP, 
Hs01104102_ sl: HESS, H501387461_gl; HESI, Hs00172878_ml; MUCSAC, 
Hs01365601_ml. 

Short-circuit current (,) measurements in Using Chambers For Using stules, 
HEC were cultured in 6-well Spel plates (Corning 3801) aa seeding density 
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‘of 3,000 calls per well Snapwell inserts contsning differentiated HBECs were then 
‘mounted in chambers bathed in Buffer (Krebs Ringer solution; 400 ml H:0, 25 ml 

AM NaCl, 25 ml 0 5M NaHCOs, 25 ml 666 mM KH.PO« + 166 mM KSHPOs 
ml 24 mM CaCh + 24 mM MgCl,,0.9 gdestose). Amilorde (Sigma, A9561) 
wwasadded apically at 10 uM to inhibit Na* absorption, then forskolin (Sigma, 
6886) was added apically at 20 uM to stimulate cAMP and finally, CFTR-172 
(Sigma-Aldrich, C2992) inhibitor was added apically and basally at 30).M. Under 
these conditions, cAMP stimulated Z, due to addition af forskolin could beattib- 
luted to CFTR mediated Cl- secretion from basolateral to apical solution, 
Statistical analysis. No statistical methods were used to predetermine samplesize. 
‘Theexperiments were not randomized The investigators were not Blinded to allo- 
cation during experiments and outcome assessment, 

“The standard error ofthe mean was calculated from the mean of atleast hese 
Independent HBEC cultures. The Students est (unpaired two-taled) was used to 
‘compare data between groups, with valu of es than 0.05 considered significant. 

‘Pearson correlation and ts associated P value between At and FOXI” of 
FOXI* cell umber per mm! was calculated using the MATLAB corr function. 
‘Multivariate regression was carried out using the MATLAB fitin function. 
Sensitivity was defined as the fractional change in induced by factional change 
In FOX (&_1) or FOX! (x_2) cell number per mm atthe Af, value acros all 
samples, estimated from the slope and intercept of multivariate regression as 

(dl, dn) (2/1) = [slope >], with = 1 and 2, for FOX) and FOX, 
spectively 
‘Reporting summary, Further information on experimental design is avalabe in 
the Nature Research Reporting Summary inked to this paper. 


(Code availability. Python scripts implementing the methods as described can be 
‘obtained upon request. 

ata availabilty. Al sequencing data ae avaiable in the Gene Ontology Omnibus 
repository under the accession number GSE102540, the NCBI Sequence Read 
“Archive under the accession number SRRSS#1096, the Klin laboratory SPRING 
viewer (hipsileintoolshmsharvardedu/paper_websitesalrway.2018/) and 
the Single Cell Portal (htpr//portals broadinsttuteorgsingle cel) 
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Extended Data Fig. 1| Atlas of transcription factors, surface molecules 
sages of mouse and human. 
.ce molecules in mouse (a) and 
‘among the list of cll-type-specific genes that met 
the following criteria: significantly enriched in lineage (false discovery rate 
(FDR) <5%, permutation test), expressed at >50 transcripts per million 
(TPMD, expressed in marked lineage at last 1.5 higher than second 


highest cluster and highest in marked lineage for 4/4 (mouse) ar 2/3 
(human) biological replicates. ¢, Pairwise correlation of cell populations 
identified by single-cell RNA-se9, Genes used for correlation analysis 
‘were the 208 most variable genes (identified as described) of all genes 
expressed ata level of at least 3 UMI counts in at least 3 cells, Wards 
‘method was used for hierarchical clustering. 
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Extended Data Fig. 2 | Gene modules identified in mouse tracheal modules in mouse airway basal cll (a) and 6 gene modules in mouse 
lineages. Gene modules were identified by selecting variable genes within airway secretory cells (b). SPRING plots show where gene modules are 
the given population that were correlated with at least other genes with expressed in agiven population, Multiple genes are combined ina single 


rank correlation >0.2.,b, Gene-gene correlation heat map shows 4gene signature defined as the mean rank: of expression (dense ranking) 
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Extended Data Fig. 3 | Gene modules identified in human bronchial 


lineages. a, Two major modules of ani-correlated genes were identified 
by selecting variable genes within the basal-to-secretory continwam that 
were correlated with at least 4 other genes with rank correlation >0.12. 

Genes within each module were then separately considered within basal 
and secretory cells, keeping genes with a correlation >0.35 with atleast 4 
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other genes. b, ¢Gene-gene correlation heat map shows 3 gene modules 
in human airway basal cells (b) and 4 gene modules in human airway 
secretory cells (). SPRING plots shove where gene modules are active 
ina given population, Multiple genes are combined ina single signature 
defined asthe mean rank of expression (dense ranking). 
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Extended Data Fig. 4| Validation of novel lineages in mouse and 
human by immunofluorescence. a, Immunofluorescence in mouse 
tracheal epithelium for Kr (green, arrowheads), Krt5 (basal), Krt8 
(Guminal), Segblat (club, secretory) and Fox (ciliated) (n=3 mice) 

b, Immunofluorescence in differentiated HBEC cultures for EOXNS 

(ea, aroves), FOXII (arrowheads mark FOX}L™ cells) and acetylated 
‘tubulin (cilia) (#2 danors) c, Immunofluorescence in HBEC cultures 
for the ionacyte markers FOXIL, ATP6VIBL and NGFR (n= 3 donor). 
Arrowhead shows apical enrichment af ATP6V BL, Arrows highlight 


lateral protrusions, Scale bar, 20 ja, 
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Extended Data Fig. | Identification of recavery-specific cll states 
and population dynamics during regeneration, a, Cells rom uninjured 
rouse airway do not equally poptlat all regions of the SPRING plat of all 
riouse data combined. Each cell rom the uninjured condition voted for 
its 10 nearest neighbours among all mouse cells profiled, and smoothed 
vote counts were used as a proxy for uninjured cell density on the map 
(two left-most plots). By visual inspection of the smooth vate distribution 
a threshold of 25 votes was chosen to binarize regions ofthe SPRING 
plot into present versus depleted in uninjured. b, Bar charts representing 


shundance of rae populations a fraction ofall cells, aver time post 
injury Error bars represent the 95% binomial proportion confidence 
interval (normal approximation). Total number cells =7,898 from n =4 
‘mice (uninjured), 898 from n=1 mouse (1 dpi, 1,964 from n=1 mouse 
(2épi), 1,082 from n= 1 mouse (3 dpi) and 2,321 from n= 3 mice (7 dpi) 
«Bar charts showing the fraction of al cells in each population that 
‘express Faxit during recavery Values shown correspond tothe fraction of 
all cells ateach time point (cell and mouse numbers as in b above). Error 
bars defined asin b 
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Extended Data Fig. 6 | Analysis of basal-to-ciiated differentiation 
trajectory following injury , Population balance analysis (PBA, 

sce Methods) was used to order 609 cells highlighted in black along the 
pseudotime oftheir basal-to-cilated progression, followed by application 
fa moving average over a window of 100 cele. The resulting ordering of 
averaged cells is referred to as the basal-t<iliated trajectory. PBA requires 
‘manually selecting source and sink cells for calculating the pseudotime 

b, Heat map ofthe 1,237 differentially expressed genes along the bi 

ciliated trajectory (permutation test, FDR <5%, fld-changenus > 2: 
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Baval-o-ilated trajectory 


sce Methods) Genes ordered by expression-weighted mean position, 
Alefined for an expression ime series xas r= 1c, Heat map of 
transcription factors only. Hierarchical cluster tevealed six major 
clusters of corelated genes. Clusters were ordered by mean expression 
‘weighted mean postion. Plots of upto 5 transcription factors sampled 
From each cluster. They axis shows the average expression ofa gene within 
the window of 100 cells 5.1 (or +1/[window sie fr mean values of 
2e10), normalized to the maximum value. The total trajectory includes 
‘09 cals. 
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Extended Data Fig. 7| Specification and characterization of FOXIL 
lineage in human bronchial epithelium. a, HBECs were transduced at 
seeding with GEP or GEP:FONII lentviras, differentiated and sorted 
for GEP (shown is representative gating strategy, n= 12), Fold change 
in ransduced cells (GEP' ) compared to non-transduced cells (GFP) 
was determined by RT-qPCR normalized to GAPDH. Pooled data from. 
2 donors transduced with GEP (n = 7 samples) or GFP:FOXI (wn 
Samples) are represented as mean +s.em. P value: FOXII, 0.001; CFTR, 
(0.04; ATP6V1B1, 0.006; FOX/1, 0.01;SCGBIAI , 0.02; two-tailed t-test 
«, Fluorescent in stu hybridization (RNAscope) for FOMII and CETR in 


HBEC culture transduced with GFP or GEP:FOX1J. Note that while there 
isan increase in FOXII/CETR co-labelled cells, not all FOX cells express 
(arrowheads versus arrows) (n =2 independent experiments 

in 2 donors)-d,e, Chromogenic in situ hybridization (RNAscope) in 
primary human bronchial tissue surface epithelium and gland ducts for 
CEPR and FOXI (d) or FOX (e). Chromogenic signals were split and 
peeudocoloured to reveal individual channels; inset regions are shown 
athigher magnification onthe right. Note that CPTR is ighly enriched 
in FOXII' but not FOX)" cells (n= 1 donor, 5 regions of bronchial tree 
analysed). Scale bars, 20 um. 
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Extended Data Fig. | Single-cell RNA-seq analysis of HBECs 
transduced with GFP and GEP:FOXI. a, SPRING plot combining cells 
transduced with GFP (1=9,436) or GEP-FOXH (n= 10,330), with each 
ofthe two conditions highlighted in blac (total cells n= 19,766)-b, The 
SLCI6A7’ population was identified tobe absent inthe virl transduction 
experiment after mapping single-cell transcriptome onto the reference 
state map. Each cell from the viral transduction experiment voted forts 
rearest neighbour in the reference experiment, The bar chat on the right 
shovts the average numberof votes per cluster. , Cell states unique to the 
viral transduction experiment were identified ex detailed in Extended Data 
Fig 5a. d, Cells representing states also found inthe reference experiment 
(conserved cells) inherited the label of thei single nearest neighbour in 


Overexpression 
experiment 


— ips ow 
ae He oe 
reigns at | Mitane! 
‘ene PrcerersTs 
oid TELUS 
ia 
Reianss aE 
abet siento wes orien 
a: 
— 
* ‘absent in 
es 
2, ro.) 
Ao 
ae 
' 
Hie 
Fil 


the reference map. Cells specific to the viral transduction experiment 
‘were divided into four clusters by spectral clustering, with their top five 
enriched genes shown inthe top part ofthe heat map (right). Enrichment 
‘of gene g in population is defined asthe fold change in expression of gin 
‘versus the second highest expresser. A pscuda value of 10 TPM was added 
before calculating the fold change, and only genes expressed at >50 TPM. 
{nat least one cluster were considered. The bottom of the heat map shows 
the top 20 enriched gens identified treating all four transduction-specific 
states as one popalation. e, Bar chart showing fold changes in population 
size following GEP:FOXII versus GEP tansduction (extension of Fig. 3) 
£, Expression of transgene in identified cell populations. 


2018 Springer Nature Limited, Al eights reserved, 


LETTER 


8 tose vate! ep ct SPRING 
Tha pas 
PHC joes no 
7 
yest? Jarush a 
aa, yi; 


baits < 


eer SIS 555 


Noton2 expression, counts 


Joo? erosion, counts 
Fy Ve 
4 of 


@ @ 


Notes expression, counts 


Dit expression, counts 
Extended Data Fig.9 | Notch pathway component enrichment in. 
airway lineages. a, b, SPRING plots show enrichment of Notch pathway 
‘components in mouse (a) and human (b) airway lineages. Normalized 
counts are shown for the Notch ligands JAGI, JAG2 and DLLI, and the 
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‘Notch receptors NOTCHT, NOTCH? and NOTCH3. The Notch target gene 
signature combines HEST, HESS and NRARP nto single gene signatre, 
defined a the mean expression rank (dense ranking). All gene expression 
land signature yalues are smoothed (see Methods for smoothing). 
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Extended Data Fig. 10 | Inhibition of Notch signalling decreases 
ionocyte markers in HBECs. a, Expression of Notch target genes and 
airway lineage markers in cultures treated with 3.3 uM DAPT compared 
to cultures treated with DMSO. Notch target genes (NRAKP P=0.03,, 
HESS) and secretory cell markers (MUCSBP=0.001, MUCSAC) are 
decreased whereas ciliated cell markers (FON)1, DNAI2 P= 0.01) and 


oe off r F 2 


LETTER 


ESP POP OEE 


basal cell markers (ITGA6 P= 0.006 and 63) are increased upon DAPT 
treatment. Note that ionocyte markers (FOXII P=0.02, CFTR) are also 
decreased upon DAP treatment. Two-taled -test; 8 experiments in 
2 donors. b, FOXIL cell counts in HBEC cultures treated with antibodies 
that neutralize individual Notch receptors (n—5 experiments in 2 donors). 
Alldata are mean + s.emn. 
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Exosomal PD-L1 contributes to immunosuppression 
and is associated with anti-PD-1 response 


Gang Chen’, Alexander C. Huang?, Wel Zhang", Gao Zhang’, Min Wut, Wet Xu®, Zit Vu’, legang Vang, Betke Wang", 


Honghong Sun 


°, Houfu Xia’, Qiwen Man’, Wengun Zhong", Leonardo F. Antelo’, Bin Wu", Xuepeng Xiong’, Xiaoming Liu®, 


Lei Guan", Ting Li*”, Shujing Liu®, Ruifeng Yang", Youtao Lu’, Livun Dong*, Suzanne McGettigan’, Rajasekharan Somasundaram', 


Ravi Radhakrishnan*, Gordon Mis, 
Giongos C. Karakousis®, Tara C. Mitchel 


‘Tumour cells evade immune surveillance by upregulating the 
surface expression of programmed death-ligand I (PD-L1), which 
interacts with programmed death-1 (PD-1) receptor on T cells to 
elicit the immune checkpoint response. Anti-PD-1 antibodies 
have shown remarkable promise in treating tumours, including 
‘metastatic melanoma”, However, the patient response rate islow'". 
A better understanding of PD-L1-mediated immune evasion is 
needed to predict patient response and improve treatment efficacy. 
Here we report that metastatic melanomas release extracellular 
vesicles, mostly in the form of exosomes, that carry PD-L1 on their 
surface. Stimulation with interferon» (IFN) increases the amount 
of PD-L1 on these vesicles, which suppresses the function of CDS 
‘T cells and facilitates tumour growth. In patients with metastatic 
melanoma, the level of circulating exosomal PD-L1 positively 
correlates with that of IFN-~, and varies during the course of 
anti-PD-1 therapy. The magnitudes of the increase in circulating 
exosomal PD-LI during early stages of treatment, as an indicator 
of the adaptive response of the tumour cellsto T cell einvigoration, 
stratifies clinical responders from non-responders. Our study 
unveils a mechanism by which tumour cells systemically suppress 
the immune system, and provides a rationale for the application of 
exosomal PD-L1 asa predictor for anti-PD-1 therapy. 

Extracellular vesicles, such as exosomes and microvesicles (also 
known as shedding vesicles, carry bioactive molecules that influence 
the extracellular environment and the immune system'*, We purified 
exosomes from a panel of human primary and metastatic melanoma 
celllines by differential centrifugation’, and verified them by trans: 
mission electron microscopy (TEM) and nanoparticle tracking anal- 
ysis (NTA) (Fig. 1a, b) Proteins associated with the exosomes were 
then analysed by reverse phase protein array (RPPA), a large-scale 
antibody-based quantitative proteomics technology". Analysis by 
PPA and western blot revealed the presence of PD-L1 in exosomes, 
and its level was significantly higher in exosomes derived from meta: 
static melanoma cells compared to those from primary melanoma cells 
(ig. le, d, Extended Data Fig. 1a).todixanol density gradient centri 
gation further confirmed the association of PD-LI with the exosomes 
(Extended Data Fig. 1b). PD-L1 was also detected in microvesicles, but 
ata lower level (Extended Data Fig. 1c~e). PD-L1 was also detected 
in extracellular vesicles generated from mouse metastatic melanoma 
BL6-F10 cells (Extended Data Fig. 11), 

“Tumour cell surface PD-L1 can be upregulated in response to IFN 
secreted by activated T cells, and PD-L1 binds to PD-1 through its 
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extracellular domain to inactivate T cells. Using immuno-electzon 
‘microscopy and enzyme-linked immunosorbent assay (ELISA) 
(Fig, Le-g), we found that exosomal PD-L1 has the same membrane 
topology as cll surface PD-L1, with its extracellular domain exposed 
‘on the surface ofthe exosomes. Exosomal PD-L1 binds PD-1 in a 
concentration-dependent manner, and this interaction canbe disrupted 
by PD-L1-blocking antibodies (Fig. 1h). Furthermore, the level of 
‘exosomsal PD-L1 secreted by melanoma cells increased markedly upon 
IFN~y treatment (Fig. If, gi), and correspondingly, these exosomes 
displayed increased binding to PD-1 (Fig. 1h) 

-Exosomes ae generated and released through adefined intracellular 
trafficking route", Genetic knockdown of the ESCRT subunit Hes, 
Which mediates the recognition and sorting of exosomal cargos, 
using short hairpin (sh) RNA led to a decrease in the level of PD-L1 
inthe exosomes and an increase of PD-L1 inthe cll (Extended Data 
Fig. 1g, h)-In addition, PD-L1 co-immunoprecipitated with Hes from 
celllysates (Extended Data Fig. li), PD-L1 co-localized with Hrs and 
(CD63, anexosomie marker in melanoma cll (Extended Data Fig.) 
Knockdown of Rab27A, which mediates exosome release’, also 
blocked PD-L1 secretion via the exosomes (Extended Data Fig. 1. 

To investigate the secretion of exosomal PD-L! by melanoma cells n 
‘vivo, we established human melanoma xenografis in nude mice. Blood 
from these mice was collected fr exosome purification and subsequent 
detection ofhuman PD-L1 proteins by ELISA (Fig. 2a). Antibodies against 
human PD-L1 specifically identified human PD-L1 on the circulating 
«exosomes from mice bearing human melanoma xenografts but not the 
control mice (Fig. 2b, Extended Data Fig 2,b). Moreover, the level of 
circulating exosomal PDL! postively correlated with tumour size (Fig. 20. 

D-LI has been found in blood samples derived from melanoma 
patients”, Recent studies suggest the presence of PD-L1 in extracellular 
‘vesicles isolated from blood samples of patients with cancer and the level 
of PD-L1 correlates wit pathological features ofthese patients!*. We 
purified extracellular vesicles from the plasma of melanoma patients 
(Gextended Data Fig. 2c-g). The level of PD-L1 on the circulating 
exosomes was significantly higher in patients with metastatic mela 
‘noma than in healthy donors (Fig 2, Extended Data Figs. 2f, 3, 3), 
‘whereas there was no or only marginal difference in the number of 
circulating exosomes or the total protein level on these exosomes 
(Extended Data Fig, 3c, d). There was less difference in PD-L1 
levels in cizculating microvescls compared to the circulating exosomes 
(Extended Data Fig. 3). The data analysis and receiver operating char- 
acteristic (ROC) curve show that, among all the parameters tested, 
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Fig. 1 | Extrafacial expression of PD-LI on melanoma cell-derived 
exosomes and its regulation by LEN-» a, A representative TEM image of 
prified exosomes from WM9 cells. Scale bar, S0/nm, b, Characterization 
Df purified exosomes using nanoparticle tracking. , RPPA data showing 
the levels of PD-L1 in exosomer secreted by primary or metastatic 
smelanoma cell ins (n= 3 for WMI552C, WM902B, A375, WM 164; 
=4 for WMS, WM793, UACC-903, WM9). See Extended Data Fig. 1a 
for statistical analysis, Immunoblots for PD-L1 in the whole cell sate 
(W) and purified exosomes (E) from different metastatic melanoma cll 
lines, Alllanes were loaded with the same amotnt of total protein, 

eA representative TEM image of WM9 cell-derived exosomes 
inmanogold-labelled with anti-PD-L1 antibodies. Arrowheads indicate 
S-nm gold particles Scale bar, 50 nm. f, Schematic (left) of ELISA 


Fig. 2| The level of PD-L1 on circulating exosomes distinguishes 
patients with metastatic melanoma from healthy donors. a, ELISA of 
hhuman PD-L1 on exosomesin plasma samples fom mice with human 
rmclanoma xenograft. b Levels of PD-L1 on exosomes isolated fom the 
plasma sample of control aude mice or mice bearing human WM9 
melanoma xenograft, measured by ELISA (n ~10).c, Pearson cortelation 
between the exosomal PD-L1 in plasma and tumour burden in 
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ted cell types. TMB, 3,35, 5'-tetramethylbenzidine: 
SA-HRP, streptavidin-horseradish peroxidase. g, ELISA of PD-L1 on. 
«xosomes from melanoma cells, with or without IPN-> treatment. h, PD-L 
binding of exosomes with IFN~yor blocking PD-L1 antibody (PD-L1 

Ab) (se Methods). i, Western blot analysis of PD-L1 in whole cells and 
‘exosomes from IFN-*-treated cells and contol cells All lanes were loaded 
‘with the same amount of total protein (lft). Quantification of exosomal 
PD-L1 by western blotting (right). Results shown represent thee (a,b) or 
to (de) independent experiments. Data are mean + sd. of three 

(Eh,i) or four (g) independent biological replicates. P values are from 
two-sided unpaired t-test (g, i). Pll gel source data (di) are shown in 
Supplementary Fig 1 


-xenogaft-bearing nude mice (n= 10). d-f, LISA of circulating exosomal 
PD-L1 (d), total PD-L1 (e) or extracellular vesicle (EV) excluded PD-L1 
(0)in healthy donors (HD, n= 11) and melanoma patients (MP, n =44), 
The exosomes were purified using the exosome isolation kit. g, ROC curve 
analysis forthe indicated parameters in patients with metastatic melanoma 
‘compared to healthy donors, Data are mean +s. Pvales are from a 
two-sided unpaired t-test (b, df. 
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Fig. 3 | Exosomal PD-L1 inhibits CDS cells and facilitates the 
progression of melanoma in vitro and in vivo. a, Representative 
histogram of CESE-labelled human peripheral CD38 T cells (top left) and 
representative contour plots of human peripheral CDS T cells examined 
for the expression of Ki-67 (middle eft) and granzyme B (GzmB) (bottom 
left after indicated treatments. The proportions of cells with diluted CFSE 
dye, or postive Ki-67 or Gam expression are shown on the right ( 


the level of circulating exosomal PD-L1 best distinguished melanoma 
patients from healthy donors (Fig. 2d-g, Extended Data Fig. 3¢, 1) 
‘The current model for PD-L1-mediated immunosuppression is, 
based on the interaction between PD-L1 on the tumour cell surface 
and PD-1 on CD8 T cells. Here we tested whether exosomal PD-L1 
inhibits CD8'T cells, First, we used confocal microscopy to show a 
physical interaction between tumour exosomes and CDS T cells puri- 
fied from human peripheral blood (Extended Data Fig. 4a, b). Flow 
cytometry analyses further indicated thatthe level of interaction was 
higher for activated CD8'T cells than for non-activated counterparts 
(Extended Data Fig. 4c). Moreover, exosomes derived from melanoma 
cells treated with IFN-» exhibited a higher level of binding to CD8T 
cells (Extended Data Fig, 4d), Nest, we tested the effec of exosomal 
PD-L1 on CD8'T cells, taking advantage of MEL624 cells, which do 
not express endogenous PD-L1 (Extended Data Fig. 5a-d) and other 
immunosuppressive proteins such as FasL. and TRAIL!. Exosomes 
derived from MEL624 cells expressing exogenous PD-LI inhibited the 
proliferation, cytokine production and cytotoxicity of CD8 T cells, as 
demonstrated by the decreased proportion of cells containing diluted 
carboxyfluorescein succinimidyl ester (CSE, a cell division-tracking 
dye), reduced expression of Ki-67 and granzyme B (GemB), and the 
inhibited production of IFN, 1-2, and TNF (Fig. 3a, Extended Data 
Fig. 5e, f), Pre-treatment of the exosomes with anti-PD-L1 antibodies 
nearly abolished these effects. Similar effects were observed using 
exosomes secreted from WM9 cells, which express endogenous PD-LI 
(Extended Data Fig. 5e-h). Exosomes derived from mouse melanoma 
'B16-F10 cells also inhibited the proliferation and cytotoxicity of mouse 
splenic CD8 T cells (Extended Data Fig. 62-d),Pre-teating OT-1T cells 
(which specifically recognize OVA peptide) (Extended Data Fig, 6c) 
with B16-F10 cell-derived exosomes inhibited their ability to kill their 
target cells (Extended Data Fig, 6f), Extracellular vesicles from human 


independent biological experiments)-b, Growth curve of PD-LI(KD) 
B16-F10 tumours with indicated treatments (n=7 mice per group). , The 
proportions of Ki-67" PD-1" CDS TILs or splenic or lymph node CD8 T 
cells after indicated treatments (n= 6 for tumour samples ofthe EXO-IgG 
‘group, and =7 forall the other group). See Extended Data Fig. 8d for 
representative contour plots. Data are mean +s (a-c). Pvalues are from 
two-sided unpaired -test (a,c) oF two-way ANOVA (b). 


Jung and breast cancer cells also contain immunosuppressive PD-L1, 
‘mostly of which isin exosomes, and PD-L1 expression is also upreg, 
ulated by IFN-»in some ofthese cell lines (Extended Data Fig. 72-e) 

“To examine the effects of exosomal PD-L1 in vivo, we established a 
syngeneic mouse melanoma model in C37BL/6 mice using B16-F10 
cells in which PD-L1 expression had been knocked down (PD-LU(KD) 
B16-F10) (Extended Data Fig. 8a) Injection of exosomes derived from 
parental B16-F10 cells promoted the growth of tumours derived from 
PD-LI(KD) B16-F10 cells, whereas pre-treatment ofthe exosomes with 
anti-PD-L1 antibodies, but not with IgG isotype oF CD63-blocking 
antibodies, inhibited the effect (Fig. 3, Extended Data Fig. 8b, c). The 
‘number of tumour-infiltrating CD8'T lymphocytes (TILs) decreased 
significantly after the injection of exosomes (Fig. 3c, Extended Data 

ig. 8d, e). B16-F10 exosomes also decreased the proportion of prolif 
erating PD-1* CD8'T cells in both spleen and lymph nodes 
Extended Data Fig. 8), suggesting that exosomal PD-L1 suppresses 
anti-tumoutr immunity systemically. 

‘We then examined the level of PD-L1 on circulating extracellular 
vesiclesinmelanoma patents during anti-PD- | therapy. The pre-treatment 
level of circulating exosomal PD-L1 was significantly higher inpatients 
‘who failed to respond to the anti-PD.-1 treatment with pembrolizumab 
(Fig 4a). The difference was, however, not significant for total circulating 
PD-L1, and undetectable for PD-L1 on circulating microvesicles, 
or extracellular vesicle-excluded PD-L1 (Fig. 4b-d). A higher level 
of circulating exosomal PD-L1 before the treatment was associated 
with poorer clinical outcomes (Fig. 4e). IFN-") upregulates exosomal 
PD-Ll and the pre-treatment levels of IFN-» were significantly higher 
inpatients who did not respond to pembrolizumab™. The level of circu- 
lating exosomal PD-L1 positively correlated withthe level of circulating 
IFN~\ and overall tumour burden (Fig. i, g), which were shown to be 
indicative of poor prognosis" 
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Fig. 4| The level of circulating exosomal 
PD-LI straifies clinical responders to 
pembrolizumab and non-responders. 

ad, Comparison ofthe pre-treatment levels of 
circulating exosomal PD-L1 (a), total PD-L1 
(b), microvesicle PD-LI (e), or extracellular 
vesiele-excluded PD-LI (d) between melanoma 
patients with or without clinical response to 
pembrolizumab. R, responders; n=21; NR, 
fon-responders;n —23., Objective response 


Spal eens sate (ORR) for patients with high and low 
1 pre-treatment levels of circulating exosomal 
005 P=noors nso D-LI fg, Pearson correlation ofthe IFN» 
ao] sie BzBao level (n= 27) or overall tumour burden 
P (gn =39) to the exosomal PD-L1 level in 
zg the plas pti eile, 
b= bh Geculatingexowomal PD-L1 at serial 
time point pre-treatment and on-treatment 
(n=38).4, Circulating exosomal PD-L1 
in clinical responders (n= 19) and non 
I aa peut oan responders (n=20) a serial time points 
heen se cama PD gm} cr PL gmt) pre-and on-treatment.j, Comparison of 
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Next, we examined the level of circulating exosomal PD-L1 in 
patients undergoing pembrolizumab therapy. In clinical responders, 
there were increased levels of PD-L on circulating exosomes, mostly 
within 6 weeks of therapy (Fig, 4h, i). The level of PD-L1 on microves 
idles also increased in the same cohort of patients, but toa lesser extent 
in comparison to exosomes (Extended Data Fig. 92) Proliferation 
and reinvigoration of CDS cells peaked at week 3 of treatment and 
preceded the peaking of exosomal PD-L1 at week 6 (Extended Data 
Fig. 9b). Moreover, in pembrolizumab-responsive patients, both the 
absolute value and maximal fold change of Ki-67 in PD-1* CD8T 
cells after 3-6 weeks of treatment positively correlated with those of 
circulating exosomal PD-LI (Extended Data Fig. 9, d). The responders 
displayed a larger increase in the level of cizculating exosomal PD-L1 as 
early as 3-6 weeks following the initial treatment (Fig. 4j). ROC analysis 
determined that a fold change of 2.43 in exosomal PD-L1 at week 3-6 
stratified patients by clinical response to pembrolizumab (Fig. 4k); 
‘fold change n circulating exosomal PD-L1 greater than 2.43 at week 
3-6 was associated with a better response to anti-PD-I therapy by 
‘objective response rate (ORR), progression-free and overall survival 
(Big. 4l, Extended Data Fig. 9e). The fold increase of total circulating 
PD-L1, microvesicle PD-LI, and extracellular vesicle-excluded PD-L1 
\was inferior to that of exosomal PD-LI for distinguishing responders 
from non-responders (Fig. 4k, m-o, Extended Data Fig. 9f-h). 

‘Our studies suggest that melanoma cells release PD-L1-positi 
extracellular vesicles into the tumour microenvironment and 


the maximum fold change of circulating 
cexosomal PD-L1 at week 3-G between clinical 
responders and non-responders. k, ROC. 
curve analysis for the maximum fold change 
of circulating exosomal PD-L1 at week 3-6 

fn clinical responders compared to non 
responders. AUC, area under curve. 1-0, ORR 
for patients with high and low fold changes of 
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circulation to counter the anti-tumour immunity systemically. Since 
exosomal PD-L1-mediated T cell inhibition can be blocked by ant 
bodies against either PD-L1 or PD.-1, our results raise the possibility 
that disrupting the interaction between exosomal PD-L1 and PD-1 
on T cals isa previously unrecognized mechanism in PD-LIUPD-1 
blockade-hased therapies. The level of PD-L1 on extracellular vesicles 
is upregulated by IFN-+, and PD-L1 on extracellular vesicles primarily 
targets PD-1 CD8'T cell, which repeesent the antigen-experienced 
“T cells that secrete IFN-y. Exosomal PD-L1 may therefore reflect the 
dynamic interplay between tumour and immune cells. Besides PD-L1, 
other extracellular vesicle proteins such as FasL may also contribute to 
immunosuppressive effects", However, PD-L1 enables exosomes 
to target predominantly PD-1* CD8'T cells, allowing tumour cells to 
counteract the immune pressure at the effector stage. In addition to the 
interaction between exosomal-PD-L1 and PD-1, the involvement of 
other molecules including B7 and CD28" inthis process also warrant 
investigation, 

‘Our study suggests that circulating exosomal PD-L1 prior to 
and during pembrolizumab treatment may reflect distinct states of 
anti-tumour immunity. The pre-treatment PD-L1 level may correlate 
With a role of exosomal PD-L1 in immune dysfunction, High levels 
of exosomal PD-L1 may reflect the ‘exhaustion’ of T cells to a stage at 
‘which they can no longer be reinvigorated by anti-PD-1 treatment. In 
on-treatment patients, however an increase in the level af exosomal 
PD-Li, following and correlating positively with T cell einvigoration, 
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Would reflect the presence ofa successful anti-tumour immunity 
dlicited by the anti-PD-1 therapy. Although the increase in exosomal 
PD-L1 in response to IFN~> could enable tumour cells to adaptively 
inactivate CDS T cells, thsi futile because the interaction between 
PD-L1 and PD-1 is blocked by pembrolizumab, We observed no 
‘marked increase in exosomal PD-L1 in non-responders, This could 
bea result of failure to elicit an adequate T cell response ora resistance 
‘mechanism to IFN» from tumours, Tumour cells in non-responders 
‘may have adaptively downregulated their response to 1FN- to avoid 
the detrimental increase in antigen presentation and to escape the 
anti-proliferative effects induced by IFN->*” 

‘Our study offers a rationale for developing circulating exosomal 
PD-L1 asa predictor forthe clinical outcomes of anti-PD-1 therapy, 
and sheds light on possible causes forthe failure of anti-PD-1 thera- 
pies experienced by many patients (Extended Data Fig. 10). Tumour 
D-L1 has been used as a predictive biomarker for clinical responses 
to anti-PD.-1 therapy". Considering the heterogeneity and dynamic 
changes of PD-L1 expression in tumours, and the invasive nature of 
tumour biopsy, developing exosomal PD-L1 asa blood-based bio 
marker could be an attractive option, 
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METHODS 
Cell culture. The A375 human melanoma and B16-F10 mouse melanoma cells 
were purchased from ATCC. The control and PD-L-overexpressing human mel 
soma MELS24 cells were provided by H. Dong (Mayo Clinic). Mouse melanoma 
LG cells stably expressing chicken OVA (B16-OVA) were provided by H.C. | En 
(The Wistar Institute). The UACC.903 human melanoma cells were provided by 
‘M. Powell (Stanford University). The melanoma cll lines WMIS52C, WMS, 
\WM793, WM©902B, WM and WMLG4 presented in thie study were established 
in M Herlyis laboratory (The Wistar Institut) All el ines were authenticated 
by DNA fingerprinting, and were tested routinely belore use to avoid myco- 
plasma contamination. Human melanoma cel lines MELS24, PD-L1/MEL624, 
\WMI352C, WMM35, WMl9028, WM793, UACC-903, WMS, A375 and WM164 
were cultured in RPMI 1640 medium (Invitrogen) supplemented with 10% (v/s) 
feta bovine serum (FBS) (Invitrogen). B16-FLU and B16-OVA calls were cultured 
In DMEM (Sigma) supplemented with 10% (x/) FBS. or stimulation with IEN-», 
cells were incubated with 100 ng/ml of recombinant human or mouse IF 
(Pepeotech) for 48h. 

Generation of stable Hrs, Rab27a or PD-L1 knockdown melanoma cells, Short 
hairpin RNAs (sbRNAS) against human Hrs (also known as HGS) (NM_OO1712, 
GCACGTCTTTCCAGAATTCAA, GCATGAAGAGTAACCACAGC), human 
[RABQ7A (NM_004850, GCTGCCAATGGGACAAACATA, CAGGAGAGGTTT 
CGTAGCTA) (git from A. Weaver, Vanderbilt University), mouse PD-LI 
{also known as C4274) (NM_021893, GCGTTGAAGATACAAGCTCAA) or 
Scrambled shRNA.contalAddgene) vere packaged into lentiviral particles using 
2931 eels co-transfected with the viral packaging plasmids. Lentiviral superna 
tants were harvested 48-72 h after transfection. Cell were infected with filtered 
lentivirus and selected by 2g/ml puromycin. 

Patients and specimen collection. Patients with stage III to IV melanoma 
(Supplementary Table 1) were encolled fr treatment with pembrolizumab (2 mg/kg 
by infusion every 3 weeks) under an Expanded Access Program at Penn (htp// 
clnicalrals gov identifier NCTO208348) or with commercial Keytruda, Patients 
fave consent in writing for blood collection unde the University of Pennsylvania 
Abrameon Cancer Centers melanoma research program tissue callection proto 
‘col UPCC 08607 in acardance with the ethics committee and The lastttional 
Review Board of the University of Pennsylvania. Peripheral blood was obtained 
in sodium heparin tubes before each pembolizumab infusion every 3 weeks for 
[week Clinical response was determined as best response based on immune 
related RECIST (HRECIST) using unidimensional measurements The assessment 
ofelnial responses for patients was performed independently ina double-blind 
fashion. Blood samples fom healthy donors were collected at The Wistar Insite 
alter approval by the ethics committee and Institutional Review Board of The 
‘Wistar Institute. Written consent was obtained from each heathy donor before 
blood collection. All experiments involving hlood samples from healthy donors 
were performed in accordance with relevant ethical regulations 

Flow cytometry of patients’ PBMCs. Peripheral blood mononuclear cells 
(PBMCs) were isolated using Ficoll gradient and stored using standard protocols. 
Ceyopreserved PEMC samples from pretreatment, cycles 1-4 (weeks 3-12) were 
thawed and analysed by flow eytometry as previously described In brie, live or 
dead cell discrimination was performed using Live/Dead Fixable Aqua Dead Cal 
Stain Kit (Life Technologies). Cell surface staining was performed fo 30 min at 
4°: Intracellular staining was performed fr 60 min oniceafte using fixation! 
permeabilization kit (eBioscence. 

Purification of extracellular vesicles. For exosome purification from cellcultare 
supernatants, cells were cultured in media supplemented with 10% exosome- 
depleted FRS. Bovine exosomes were depleted by overnight centrifugation at 
10, 000g. Supernatants were collected from 48-72) cll cultures and extracellular 
vesicles were purified by standard diferent centrifugation protocol”. In bee 
lture supernatants were centrifuged at 2,000 for 20 min to remove cell debris 
and dead cls (Beckman Coulter, Allegra X-4R). Microvesces were pelleted after 
‘entriagation at 165g for 45 min (Beckman Coulter, 2-H) and resuspended 
In PBS. Supernatants were then centrifuged at 10,000 for hat 4°C (Beckman 
Coulter, Optima XPN-100) The pelleted exosomes were suspended in PBS and 
collected by ultracentrifgation a 10000 for 2, 

For purication of circulating extracellular vesicles by differential centet: 
‘gation, venous citrated bload from melanoma patents or healthy donors was 
centrifuged at 1.550 for 30 min to obtain call-fre plasma (Beckman Coulter. 
Allegra X-14R). Then, 1 mi of the obtained plasma was centrifuged at 16,500 
for45 min (Eppendor, S418R) The pelleted microvesicles were suspended in 
PS. The collected supernatants were then centrifuged at 100,000 for 2hat 4°C 
(Beckman Coulter, Optima"™ MAX-XP) to pellet the exosomes. For purification 
of eiculting exosomes using the exosome isolation kit cll-fee plasma was st 
centrifuged at 16,500 for 45 min (Eppendorf, 5418R) to pellet arge membrane 
‘esicles Exosomes were then purified from the supernatants using the exosome 
Isolation it lvitrogen, Cat 4484450), 
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(Characterization of purified exosomes. For verification of purified exosomes using 
electron microscopy, purified exosomes suspended in PRS were dropped on format 
‘arhon-coated nickel grids. After staining with 2% uranyl acetate, gride were 
Sirried and visualized using a EM-1011 teansmission electron microscope. For 
immunogold labelling purified exosomes suspended n PAS were placed on format 
caron-costed nickel grids, locked, and incubated with mouse anti-human mod0- 
clonal antibody that recognizes the extracellular domain of PD-L1 (clone SHI-A3), 
followed by incubation with the anti-mouse secondary antibody conjugated with 
protein A-gold particles (nm). Each staining step wa allowed by ive PRS washes 
and ten ddH,O washes before contrast staining with 2% uranyl acetate. 

“The sie and concentration ofexosomes purified from cell cultute superna- 
tants or patients plasma were determined using a NanoSight NS300 (Malvern 
Instruments), which is equipped with fast video capture and partcle-tacking 
softare 

For lodixanol density gradient centrifugation, exosomes harvested by dfferen- 

til centrifugation were loaded on top ofa discontinuous fodixanol gradient 
10%, 20% and 40%, made by diluting 60% OptPrep aqueous odixana wth 0.2 5M 
sucrose in 10 mM Tris) and centrifuged at 100,000 for 18 hat 4°C (Beckman 
Coulter, Optima MAX-XP), Twelve fractions of equal volume were collected rom 
the top ofthe gradients, withthe exesomes distributed atthe density range between 
1.13 and 1.19 g/l as previously demonstrated", The exosomes were further 
pelleted by ultracentiugation at 10,000 for 2h at 4°. 
Immunoprecipitation To analyse the role of ESCRT machinery in exosomal 
secretion of PD-L1 in melanoma cells, PD-L1/MEL624 cells were transfected 
with Flags plasmid vector and then lysed, The cleared lysate was incubated 
With Ant-FLAG Affinity Gal Sigma-Aldrich) overnight at 4°C. The immunopre- 
‘ipitated proteins were resolved by SDS-polyacrylamnide gl electrophoresis and 
transferred to nitrocellulose membranes. PD-L1 and Fag (His) were determined 
by westem blot using specific antibodies, 
ELISA. For detection ofPD-L1 on extracel vesicles, cell supernatants or ptiené 
plasma, ELISA plates (96-wel) (Bolegend) were coated with 125 gper wel (100) 
‘monoclonal antibody against PD-L1 (cone SHI-A3) avernght a 4°C. Freebind- 
ing tes were blocked with 200 ofblocking bull (Perce) for Lh ateoora tempera- 
ture. Then 100}: plasma sample with or without extracellular esceremoval.or 
estracllular vesicle samples purified from plasma or cel culture supernatants, were 
added to each well The exosome or microvesile samples pried from cell culure 
supernatants were repate by serial dution according to the total protein level to 
nays the enrichment of P-L! on exosomes and microvesicles The concentration 
fof PD-L1 on the surface of exosomes isolated fom indicated celle was calculated 
based onthe linar range ofthe ELISA assay data, The exosome or microvesicle 
samples derived from the plasms samples of healthy donors or melanoma patients 
‘were prepared using the same valumeof PAS a the plasma as they were rignally 
‘evved from, The plasma samples with extracellular vesce-ecladed) or without 
(total) extracellular vesicle removal were uted with PBS in 1.0.75 volume catia. 
After overnight incubation at 4°C, biotinylated monoclonal PD-L1 antibody 
(lone MIE, eBioscience) was added to each wel and incubated for 1 hat room 
temperature. A total of 100 per well of horseradish perexidase-conjugated 
streptavidin (BD Biosciences) diluted in PBS containing 0.1% BSA was then 
fudded and incubated for | at room temperature. Plates were developed with 
tetramethylbenzidine (Pierce and stopped with 0.5N H,SO, The plates were read 
21450 nm witha BioTek plate reader Recombinant human PD-L1 protein (R&D 
Systems, Cat 156-87) was used to makea standard curve. Recombinant P-selectin 
protein (RAD Stems, Cat 137-PS) was used as negative contol to verify the detec- 
"on spciticg Te esl of standard curve demonstated thatthe established ELISA 
exhibited. reliable linear detection range fom 0.2 to 12 n/t 

For detection of IFN-», TNFand IL-2, the supernatant of human CD8 T calls 
was harvested and measuced according tothe kit manufacturer’ instructions 
(Biolegend). 

PD-1-PD-L1 binding assay. To test the binding of exosomal PD-L1 ta PD: 
100) of exosome samples of different concentrations were captured onto PD-LL 
antibody (clone SH-A3)-coated 96-well ELISA plates by overnight incubation at 
41°C. Then 100 of 4p bitin Labelled human PD-I protein (HPS Bioscience, 
CCaté 71108) was added and incubated for 7h at oom temperature. A total of 
100) per wel horseradish peroxdase-conjugated streptavidin (BD Biosciences) 
diluted in PBS containing 01% BSA was then added and incubated for 1 hat 
oom temperature, Plates were developed withtetramethlbenzidine (Perce) and 
stopped using SN H:S0,. The plates were read a 450m with a BioTek pate 
reader. Recombinant human PD-L1 protein directly cated anto the plates vas 
tse asthe postive contr, 

“Treatment of CDS'T cll withthe exosomes.T lock PD-L1 onthe exesomesur- 
face, purified exosomes (200).) were incubated with PD-L1 blocking antibodies 
(Logi) oF IgG isotype antibodies (10jg/al in 100d PBS, and then washed 
with 30 mil PBS and pelleted by ultracentrifugation ta remove the non-bound 
free antibodies. Human CDS T cells purified from peripheral blood using 
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immunodepletion on a Ficoll-Hypaque gradient (RosetteSep, StemSep 
‘Technologies) or mouse CDS T cells purified from splenocytes using Dynabeads 
‘Untouched Mouse CDS Calls Ki (nvitogen) werestimulated with ant-CDS (2m) 
snd ant-CD28 (2yg/ml antibodies fo 4h and then incubated with human me- 
‘noma cll- derived exosomes or mouse B1G-F10 cell-derived exosomes with oF 
‘without PD-L1 blocking for 4 inthe presence of anti-CD3/CD28 antibodies, 
For human CD8'T cel (2 + 10 els/wellin a 96-well plate, 25jy/ml of human 
‘WM cell-derived exosomes (carrying surface PD-L1 ata level of 0.05 ng perpgof 
exoxomes as determined by ELISA, Fig i) were used as the circulating exosomal 
PD-L1 level in melanoma patients i around 1.25 ng/ml (Fig. 2h). For mice CD& 
‘cells (2 + 10" cellsiwel in 296-well plat), 100,g/m of mouse B16-F10 cell 
derived exosomes (carrying surface PD-L1 at aleve of 0.0160g perugexosomes 
as determined by ELISA) were used asthe circulating exosomal PD-L1 level in 
‘ice bearing BL6-F10 tumour is around 1.63 ng/ml. The treated cells were then 
collected, stained, and analysed by flow eytometr: Information about the peimary 
sntbodisisincladed in Supplementary Tale 2 To asay forthe proliferation of 
(CD8 Teel, CFSE, ade forthe tacking of cell division (Molecular Probes) was 
ted, A total of I> Ld? CDS cells were stained with CFSE at uM. The cll 
‘were then incubated st 37°C for 20 min nd the reaction was stopped by adding 
5 volumes of cold medium with 10% FBS, and treated as above. Unstimulated 
CCFSE-Loblled cells served asa non-dividng contol. 

“The exosome=T cell bin To verify the physical interactions between 
selanoma cll derived exoromes and CDS T cel, pried exosomes were stained 
with CFSE in 100) PRS, and then washed with 10 ml PBS and pelleted by ultra- 
centrifugation. Unstimulated orstimlated human CDS cells (2 « 10 cellswell 
{in 96-well plates) were treated with CESE-Labelled exosomes (25 g/ml) for 2h 
snd then fixed fr flow cytometry or confocal microscopy alter immunostaining 
for CDS Teel 

Generation of dendritic ces from bone marrow. Dendritic cells (DCs) were 
generated from bone martow of CS7RL6 ice and cultured in RPMI 1640 with 
108 (v/v) FB5, 20 mM 1-gltamine,50pM -mercapoethanol, 20 ng/ml IL-4 and 
20ng/ml GM-CSF After 3 days, half afte culture medium was replaced by feesh 
‘medium containing 40 ng/ml IL-4 and 40 ng/ml GM-CSF. To prime antigen- 
specific OT-1 CDS T cells, DCs were subsequently loaded with ml SUINFEKT. 
(OVA a) peptide overnight 

CDS cell mediated tumour cell killing assay To determine the effects of mela- 
‘oma cell-derived exosomes on the ability of CD8 Tcellsto kill tumour cell, CD& 
‘Tells were purified from the splenocytes of OT-1 mice expressing a transgene 
encoding aT cll receptor that specifically recognized SUINFEKL peptide hound to 
MHC-1H-21". O-1CD8T ces (4 10° cllsiwellina48-well pate) were then 
activate by incubation with SIINFEK Loaded (2 g/ml) bone marrow-derived DCs 
(2 10° cllswel). The activated OT CDS T cll (4% 10° clla/well in 48-vell 
lat) were treated with PBS asa control) or BLG-F10-derived exesomes(100}./ml 
for 48h) with or without IgG isotype or PD-L1 antibody blocking (10 ug/ml) 
snd then co-cultured with CFSE labelled melanoma PD-L1 (KD) BIG/OVA cell 
(4 10 in well plates fr 48h at an elector to target (ET) eatioo Cells were 
then harvested, intracellularly stained with BV6SO-conjugated antibody against 
cleaved-caspase3 (BD Biosciences) and analysed by Now cytometry. Information 
out the primary antibodies is included in Supplementary Table 
Immunoflaorescence staining. Immunofluorescence staining wss performed on 
fixed cells or formalin-fixed, paraffin-embedded (FFPE) sections. For fixed calls, 
permeabilization with 0.1% Triton X-100 was performed before blocking with 
Bovine serum albumin (BSA) buffer fr Ih. Foe FFPE sections, antigen retieval 
bysteaming in citrate bufler (p#=6.0) was performed before blocking, The fixed 
calls or FFPE sections were incubated with primary antibodies vernight at 4°C, 
followed by incubation wth fuorophore-conjugated secondary antibodies for 1 
fh Nuclei were stained with DAPI. Samples were observed using Nikon confocal 
sicrosenpe at 100> magaication, 

‘Western blot analysis, Whole cell lysates or exosomal proteins were separated 
using 2% SDS-PAGE and transferred onto nitrocellulose membranes. The blots 
‘were blocked with 5% on-ft dry milk t room temperature fr lh, and incubated 
‘overnight at 41°C with the coresponding primary antibodies at dilutions recom - 
‘mended by the supplier, fllowed by incubation with HRP-conjugated secondary 
antibodies (Cell Sigaaling Technology) at room temperature for Th Theblots on 
the membranes were developed with ECL detection reagents (Pierce). CD83, Hi, 
Alix, and TSG101 were wed as exosome markers TYRP-1 and TYRP-2 were wed 
‘ss melanoma specific markers, GAPDH was used as loading control. Information 
stout the primary antibodies wa incladed in Supplementary Table 2. 
(Quantitative PCR (qPCR). Toal RNA was solated from CDS T calls using TRol 
‘Reagent (Invitrogen), and reverse transcribed into frst strand complementary 
DNA (cDNA) with random primer with RevertAid First Sand cDNA Synthesis 
Kit(ThermaFisher Scie). The samples were then analysed in an Applied 
Fiosystems QuantStudio 3 Real-Time PCR system. GAPDH was used a an inter 
‘al contra, nformation about the primersis included in Supplementary Table 3. 


Invivo mice study, Al animal experiments were performed according to protocols 
approved by the Institutional Animal Cae and Use Commitee ACUC) ofthe 
‘University of Pennsylvania. Fr establishing human melanoma xenograft model 
{nme mice, WMI call (5 » 10 alsin 100 medium) were injected into Hanks 
of f-week old female athymic ude mice. Tumours were measured using digital 
caliperand thetumout volume wasalulated bythe formula (width) length/2. 
‘Mice were euthanized 30 days after cell inoculation orf the longest dimension 
‘ofthe tumours reached 20 em before 30 das. Immediately following euthanasia, 
blood samples were harvested by cardiac puncture, and exosomes were puriied 
snd detected by ELISA using the aforementioned method. Exnsomespurted fons 
sex-,age-and weight-matched healthy aude mice without xenograft were used 
asthe control 

For establishing syngeneic mouse melanoma mode in CS7BL/6 mice, B16-F10 
cllsor BI6-F1O PD-LI (KD) cells (5 » 10° cells in 1001 medium) were subeuta- 
‘ncously injected into immunocompetent Cs7BL/6 mice. Based on the difference 
{in thelevel of ciculating exosomal PD-L1 between mice bering parental B16-F10 
and PD-L1 (KD) B16-F10 rumours (1.63 n/a vs 0.70 ng/m), a total of 100, 
‘of parental B16-F10 cell-derived exosomes (carrying surface PD-L at level of 
(016 ng peri of exosomes) with or without Isotype, CD63 or PD-L1 blocking 
(Lough) were injected into mice alter inoculation of PD-L1 (KD) B16-F10 cell 
twexamine the functional significance of PD-Ll, The dose of 100). exosomes sed 
for our in vivo study was equivalent to approximately 30% ofthe physiological level 
of circulating exosomes in mice, and wa alsa comparable ta those from a palpable 
tumour in mice according to our data Tal vein injections of exosomes (100) 
{in 1004 PBS) were performed every 3 days. Mice were weighed every 3 days 
“Tumours were measured using a digital caliper and the tumour volunie was cal- 
culated by the formula: (width) lengih/2. The mice were euthanized before the 
longest dimension of the tumours reached 2.0 cm. Mice wet allocated randomly 
tw each treatment group. Downstream analyses of mouse samples (immunofluo- 
scence staining, low cytometry and ELISA) were performed ina blinded fashion, 
Far flow cytometry the spleen and tumour samples were harvested, and single cell 
suspensions were prepared and red blod cells were ysed using ACK Lysis Buffer 
Information about the primary antibodies iinluded in Supplementary Table 2. 
‘Reverse phate protein array (RPPA). RPPA was performed atthe MD Anderson 
(Cancer Center cone facility using 50g protein per sample. All ofthe antibodies 
‘were validated by western blot. Methods for data analysis are described bev. 
‘Statistical analyses. RPPA data analyse was performed according tothe protocol 
{fom the MD Anderson Cancer Center. Specially relative protein levels foreach 
sample were determined by interpolation of each dilution curve from the standard 
curve (supercurve) ofthe slide (antibod). Supercurve is constructed by a script 
fn R written by the RPPA core facility. The package binaries of SuperCurve and 
SuperCurveGUI are avalableinR-Forge (tps! /-forge project any/R/7group_ 
[d=1899), These values are defined as supercurve log, value. All he data points 
‘were normalized for protein loading and transformed to linear value, designated ac 
“normalized linear: Normalized linear value was transformed tothe logs value, and 
then median-centrd fr further analysis. Median-centred values were obtained 
by subtracting the median ofall samples ina given protein. Alo the above- 
-mentioned procedures were performed by the RPPA coe facility, The notmal- 
lined data provided by the RPPA core facility were analysed by Custer 3.0 (bsp) 
‘bons ms u-tokyo.acjp/-mdehoon/software/clister!) and visualized using the 
Java TreeView 1.05 (hp jreview sourceforg.ne). 

‘All other statistical analyses were performed using GraphPad Prism x60. 
[Normality of distribution was determined by DAgostino-Pearson omnibus nor- 
‘malty test and yarance between groups was assesed by the F-test. Fr normally 
distributed dats, sgnficnce of mean diferences was determined using twotalled 
paired or unpaired Students -ests fr groups that difered invariance, unpaired 
‘test with Welch’ carection was performed, For data tht were not normally 
istributed, non-parametric Mann-Whitney U-tess or Wileoxon matched pais 
tests were used for unpaired and paired analysis, respectively Correlations were 
determined by Pearsons r coefficient. Two-way ANOVA vas used to compare 
‘mouse tumour yume data among diferent groups. log-rank and Wilcoxon ests 
‘were used to analyse the mouse survival dats Error bars shown in graphical data 
represent mean ed_A two-tailed value of P < 0.05 was considered statistically 
significant. 

[Reporting summary. Further information on experimental design savalable in 
the Nature Research Reporting Summary linked to this pape. 

Data availability All data and materials are avalable fom the authors upon ra- 
sonable request. 
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Extended Data Fig. 1 | Melanoma cells release extracellular vesicles 
carrying PD-LL. a, The log:-transformed RPPA data showing a higher 
level of exosomal PD-L1 secreted by metastatic melanoma cll ines 
compared with primary melanoma cll ines, Data represent mean sl, 
of four primary (WMI552C, WMS, WM793, WMS02B) or metastatic 
(VACC-903, 1205Lu, WM9, WMI64) melanoma lines. b, Density 
gradient centrifugation confirming that PD-L1 secreted by WM9 cells 
‘o-fractionated with exosome markers CD63, His, Alix and TSGIOI, 

«, Imimunoblots for PD-L1 in the whole cll lysate (W), purified exosomes 
(B) or microvesicles(M) ftom different metastatic melanoma cell ines. 
‘The same amount of protein was loaded in each lane. d, Levels of PD-L1 
fon the exosomes or microvesicles derived from melanoma cells as assayed 
by ELISA. The levels of exosomal PD-L1 and microvesicle PD-LL 
produced by an equal number of melanoma cell. mmunoblots for 
D-L1 in the whole cell Isat, purified exosomes or microvesicles from 
‘mouse melanoma B16-P10 cells, The same amount of protein was loaded 


ineach lane. g.h, Western blot analysis of PD-L1 in Hrs knockilown cells 
without (g) or with (h) IEN-~ treatment, Quantification of the western 
blotting data (g, right h eight) i Co-immunoprecipitation of PD-L1 

and Hrs from MEL624 cells expressing exogenous PD-L1 and Hr, 
j,Ammunofluorescence staining of intracellular PD-L and exosome 
marker Hrs in WMS cells treated with [EN k, Immunofluorescence 
staining of intracellular PD-L1 and CD63 in WM9 cells treated with 
JEN-»-1 western blotting analysis showing intracellular accumulation of 
D-L1, and decreased exosomal secretion of PD-L1 in WMS cell with 
‘RAB27A knockdown (lef) The levels of exosomal PD-LI were compared 
(sight). Two experiments were repeated independently with similar results 
(by 6, £14k). Data represent mean +54 of four (d,e).or three (gh) 
independent biological replicates. Statistical analysis is performed by 
tneo-sided unpaired r-test (a, de, gb, I). For gel source data (b,c, 

see Supplementary Fig 1 
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Extended Data Fig. 2 | Melanoma cells secrete exosomal PD-L1 into 
the circulation, a, The monoclonal antibodies against the extracellular 
domain of human PD-LI specifically detect human exosomal PD-L1, but 
‘not mouse exosomal PD-LI (n=3 biologically independent experiments) 
b, Levels ofhuman PD-L1 in exosomes from the plasma of control nude 
mice (n= 10) and human WM9 melanoma xenograt-bearing nude mice 
(n= 10) permg of total circulating exosomal proteins. c, Characterization 
of circulating exosomes purified from the plasma of a patient with 

Stage IV melanoma using NanoSight nanoparticle tracking analysis, 

4, Characterization of circulating microvesiles purified from the plasma 
‘sample ofa patient with Stage IV melanoma using NanoSight nanoparticle 
tracking analysis, e, Immunoblot for PD-L1 in the microvesiles purified 


from the plasma samples of 8 patients with Stage IV melanoma (denoted 
as P1-P8).f, lmmunoblots for PD-L1 in the exosomes purified from the 
plasma samples of 3 healthy donors and 5 patients with stage LV melanoma 
(left panel). Quantification of the levels of exosomal PD-L by western 
blot analysis (right panel). Results are expressed asthe percentage of the 
‘mean value af healthy donors. g Standard density gradient centrifugation 
analysis showing tha circulating PD-L1 co-fractionated with exosome 
‘markers Hrs and TSGI01 and melanoma-specific marker TYRP-2. Three 
(c.d) or two (cg) experiments were repeated independently with similar 
results. Data represent mean +s-d(a,b, A. Statstical analyses were 
performed using two-sided unpaired P-test(b, ) For gel source data (e-g), 
see Supplementary Fig. 
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Extended Data Fig. 3 | The number or bulk protein level of circulating 

cexosomes shovis no oF modest difference hetwcen healthy donors 

fand patients with metastatic melanoma, a ELISA showing the level, 

‘of PD-L1 on circulating exosomes purified from healthy donors (HD, 

n=I1) and melanoma patients (MP, n=44) The exosomes were purified 

‘sing differential centrifugation. b, Pearson correlation between the 

ELISA-detected levels of PD-L1 on circulating exosomies purified by 

dlfferental centrifugation or using the commercial exasome isolation kit 


Exosomal PD-Lt (ng/ml) analyzed 
by differential centrifugation 
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Exosomal PD-L1 (ng/ml) 
Total PD-L1 (ng/ml) 


Microvesicle PD-L1 (ng/ml) 
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(1=44).c, Comparison of the numberof circulating exosomes between 
healthy donors (n= 10) and melanoma patients (n =38).d, Comparison 
ofthe protein content of circulating exosomes between healthy danors 
(n= 10) and melanoma patients (1 =38)e, ELISA of the circulating level 
‘of microvesicle PD-L1 in healthy donors (HD, n= 11) and melanoma 
patients (MP, n= 44). f, Detaled data associated with the ROC curve 
Bnalysis depicted in Fig. 2g. Data represent mean sd. Statistical analyses 
ate performed by two-sided unpaired ¢-tst (a €-). 
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Extended Data Fig. 4 | Melanoma cell-derived exosomes bind to 
CDS T calls on their surface a Representative contour plots shoving 
the general gating strategy used to identify the purified CDS T cells 
(CD3'CDs"CD4 ) from human peripheral blood. b, Confocal 
ricroscopy analysis of human peripheral CDS T cells stimulated with 
anti-CD3/CD28 antibodies) after incubation with CFSE-labelled WMS 
‘cell-derived exasomes for 2h, The experiments were repeated three 
times independently with similar results. , Representative histogram of 


human peripheral CDS T cells with or without anti-CD3/CD28 antibody 


stimulation after incubation with CESE-labelled WM9 cell-derived 
fexosomes for 2h (let). The proportion of exasome-bound cells is shown, 
(right), Representative histogram of human peripheral CDS T cells 
(Gtimlated with anti-CD3/CD28 antibodies) after incubation withthe 
same numberof CESE- labelled exosomes purified from control or EN-» 
{tested WM9 cells for 2 (left panel). The proportion of EXO-bound celle 
isshovn in the right panel. Data represent mean + s.d. of four (e) or three 
(@) independent biological replicates. Statistical analyses are performed 
using two-sided unpaired t-test (c,d) 
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Extended Data Fig. 5 | See next page for caption. 


Extended Data Fig. S| Functional inhibition of CDS T cells by exosomal 
PD-L1. a, The log-transformed RPPA data showing the levels of PD-L1 in 
the exosomes secreted by control (MEL624) or PD-L1-expressing (PD-L1/ 
‘MEL624) human melanoma MEL624 cells (Bottom)-b, Immunoblots for 
-L1 inthe whole cell lysate (W) oF inthe purified exosomes (E) from 
MEL624 o¢ PD-L1/MEL624 eels. The same amount of protein was loaded 
in each lane. The experiments were repeated two times independently 
with similar results For source data, see Supplementary Fig. 1.¢,PD-LI 
fn the surface of exosomes secreted by MEL624 or PD-L1/MEL624 cells 
asdetermined by ELISA. d, Levels of PD-LI on exosomes secreted by 
MEL624 or PD-L1/MEL624 cells, as measured by ELISA. e, qPCR analyses 
IPN, and TNF in human peripheral CDS T cells stimulated with 
anti-C3/CD28 antibodies) after treatment with MEL624 cell-derived 
exosomes, PD-L1/MEL624 cell-derived exosomes of WM9-cell-derived 
texosomes with or without blocking by lgG isotype or the anti-PD-L1 
antibodies. The relative mRNA expression level was calculated asthe ratio 
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tothe control cells f ELISA of IL-2, EN->, and TNF in human peripheral 
(CDS T cells (stimulated with anti-CD3/CD28 antibodies) after treatment 
‘with MEL624 cell-derived exosomes, PD-L1/MEL624 cell-derived 
texosomes ot WM-cellderived exasomes with or without blocking by 
IgG isotype or PD-L1 antibodies. g, Representative histogram of CESE- 
labelled human peripheral CD8'T cell (stimulated with anti-CD3/CD28 
antibodies) after treatment with WM9 cell-derived exosomes with or 
‘without antibody blocking let). The proportion of cells with diluted 
CCESE dye is shown (ight) h, Representative contour plots of human 
peripheral CD8T cells (stimulated with anti-CD3/CD28 antibodies) 
examined forthe expression of granzyme B (GzmB) after treatment with 
WN cell-derived exosomes with or without antibody blocking (lef) 

The percentage of GrmB' CDS T cells stimulated with anti-CD3/CD28 
antibodies is shown atthe right panel Data represent mean = sal of three 
(2,66, fh) or four (dg) independent biological replicates. Statistical, 
analyses are performed using two-sided unpaired t-test (d-h), 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Exosomal PD-LI secreted by mouse melanoma 
1816-F10 cells inhibits the proliferation and cytotoxicity of mouse 
splenic CD8 T cells. a Representative contour plots showing the 

general gating strategy used to identity the purified CD8T cells, 

(CD3' CDs" C4") from mouse splenocytes. b, Representative histogram 
of CESE labeled mouse splenic CDS T cells (stimulated with ati- 
CD3/CD28 antibodies) after treatment with B16-F10 cell-derived 
texsomes with of without blocking by IgG isotype or the anti-PD-L1 
antibodies (lef). The proportion of cells with diluted CFSE dye is shown 
at the right panel. c, Representative contour plats of mouse splenic CDS 

1 cells (stimlated with anti-CD3/CD28 antibodies) examined for the 
expression of Ki-67 and granzyme B (GzmB) after treatment with B16-F10 
cell-derived exosomes with or without blocking by IgG isotype or the 
anti-PD-LI antibodies (Left). The percentage of Ki-67" G2mB CDS 

‘T cells stimulated with anti-CD3/CD28 antibodies is shown (eight) 

4, Representative contour plots of mouse splenic CD8T cells stimulated 
with anti-CD3/CD28 antibodies) examined forthe expression of Ki-67 and 
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Gam after treatment with BLG-FLO cell-derived exosomes in the presence 
for absence of anti-PD-1 blocking antibodies (let). The percentage of 
Ki-67'GzmB" CD8'T cells stimulated with anti-CD3/CD28 antibodies is 
shown a the right panel ¢,OT-1 CDS cell-meditated tumour cell killing 
assay was performed in B16-OVA cells with PD-L1 knockdown, oF B16- 
10 cells with PD-L1 knockdown (negative control). Apoptosis of tumour 
cells was evaluated by flove cytometric analysis of intracellular cleaved 
caspase-3 (left) and the relative cytotoxicity was calculated (right) f, OT-1 
CDS T calls, activated by OVA-pulsed bone marrow-derived dendritic 
cells and treated with PBS (as a contol), exosomes derived from B16-F10 
cells with or without IgG isotype or PD-L1 antibody blocking, were 
co-cultured with PD-L1 knockdown BLG-OVA cells for 48h. Tumour cell 
poptosis was evaluated by flow cytometric analysis af intracellular cleaved 
caspase-3 (left) and the relative cytotoxicity was calculated (ight). Dat 
‘epresent mean +s, of three (b-f) independent biological replicates. 
Statistical analyses are performed using two-sided unpaired t-test (b-) 
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Fig. 7 | Lung cancer and breast cancer cells release 
extracellular vesicles carrying PD-L1. a, Immunoblots for PD-L1 in 

the whole cell lysate (W), purified exosomes (E) or microvesicles(M) 

fom different lung cancer cell lines. The same amounts of proteins were 
loaded foreach fraction b, Immunoblots for PD-Lt in the whole cell 
lysate, purified exosomes or microvesicles fom the breast cancer cell line 
MDA-MB-231, The same amount of protein was loaded for each fraction, 

« Immunoblots for PD-L1 in the whole cell lysate (WCL) ar in the purified 
‘exosomes (EXO) from control (C) or IEN-»-treated (IFN) lung cancer 
cells, The same amounts of exosome proteins from IEN->-treated and 
control cells were loaded (ef). Quantification ofthe exosomal PD-L1 
level determined by western blot analysis (right). d, immunoblots for 
PD-L1 in the whole cell lysate or inthe purified exosomes from control or 


LEN-+-treated the breast cancer MDA-MB-231 cells. The same amounts of 
exosome proteins from IFN-»-teated and control cells were loaded (lef) 
(Quantification ofthe exosomal PD-L1 level determined by western blot 
analysis (right), Representative contour plots of buman peripheral CD8 
T cells examined fr the expression of Ki-67 and Gam after treatment 
with H11264 cell-derived exosomes with or without blocking by IgG isotype 
fr PD-L1 antibodies (lft). The percentage of Ki-67" or GumB! CD8'T 
cells is shown (right). The experiments were repeated twice independently 
with similar results (a,b) Data represent mean +s. of three (c-e) 
independent biological replicates. Statistical analyses are performed using 
two-sided unpaired t-test c-e). For source data (a-d), sce Supplementary 
ig. 
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in vivo. a, Representative flow cytometric histograms of B16-F10 cells 
txanined forthe expression of PD-LI with or without PD-L1 knockdown, 
5816-10 cell were stably depleted of PD-L1 using lentivical shRNA 
against PD-LI (shPD-L1) ofthe scrambled control shRNA (GhCTL). The 
experiment was repeated twice independently with similar results 

b, Representative images showing the growth of PD-L1 knockdown 
B16-F10 tumoursin CS7BL/6 mice after indicated treatments, 
Experiments were performed using 7 mice for each group. c, The weights 
of PD-L1 knockdown B16-F10 tumouts from C57BL/6 mice with 


4, Representative contour plot of CD8 TIL or splenic or lymph node CDS 
1 cells examined forthe expression of Ki-67 after indicated treatments. 
Experiments were performed using 7 mice for each group. See Fig 3c for 
{qoantification data e, Representative immunoflunrescence images of CDS 
TILs in tumour tissues (lef). The number of CD8 TILs foreach mouse 
(1=7 mice per group) were quantified from 5 high-power fields (HPF) 
(right) Statistical analysis is performed using two-sided unpaired t-test 
eo. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig.9 | The level of circulating exosomal PD-L1 
distinguishes clinical responders to pembrolizumab treatment from 
non-responders. The levels af PD-L1 on circulating microvesicles at 
Serial time points pre-and on-treatment (w= 39), The frequency of 
PD-1" Kis67" CDI T cells and the level of circulating exosomal PD-L1 in 
clinical responders at serial time points pre-and on-treatment (#=8). 

¢, Pearson correlation of the maximium level of circulating exosomal 
PD-L1 at week 3-6 to the maximum frequency of PD-1'Ki-67" CD 
‘Tells at week 3-6 in clinical responders (n =8) and non-responders 
(r=11).d, Pearson correlation ofthe maximum fold change of circulating 
exosomal PD-LI level at week 3-6 o the maximum fold change of 
PD-1°Ki-67! CD8'T cell at week 3-6 in clinical responders (7 


) and 
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non-responders (= 11). e, Kaplan-Meier progression-free and overall 
survival of patients with high (n= 11) and low (n= 12) fld changes 
‘of circulating exosomal PD-L1 at 3-6 weeks. Comparison ofthe 
‘maximum fold change of total circulating PDL at week 3-6 between, 
the clinical responders and non-responders. R, responders, = 19: NR, 
non-responders, =20, , Comparison of the maximum fold change 
‘of circulating micravesicle PD-L1 at week 3-6 between the clinical 
responders (n= 19) and non-responders (n= 20). h, Comparison of. 
the maximum fold change of extracellular excluded PD)-Ll at weck 3-6 
between the clinical responders (n= 19) and non-responders (n=20) 
ta represent mean = sd. Statistical analyses were performed using two- 
sided paired t-test (a), log-rank test (e), or two-sided unpaired t-test (F-h) 
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A multiprotein supercomplex controlling oncogenic 
signalling in lymphoma 
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B cell receptor (BCR) signalling has emerged asa therapeutic target 
in B cell lymphomas, but inhibiting this pathway in diffuse large B 
cell lymphoma (DLBCL) has benefited only a subset of patients!. 
Gene expression profiling identified two major subtypes of DLBCL, 
Known as germinal centre B cell-like and activated B cell-like 
(ABC)**, that show poor outcomes after immunochemotherapy in 
ABC. Autoantigens drive BCR-dependent activation of NF-xB in 
‘ABC DLBCL through a kinase signalling cascade of SYK, BTK and 
PKGS to promote the assembly of the CARDII-BCLIO-MALTL 
adaptor complex, which recruits and activates IkB kinase™*, 
Genome sequencing revealed gain-of-function mutations that 
target the CD79A and CD79B BCR subunits and the Toll-like 
receptor signalling adaptor MYD88™, with MYD88(L265P) being 
the most prevalent isoform. In a clinical tral, the BTK inhibitor 
ibrutinib produced responses in 37% of cases of ABC!. The most 
striking response rate (80%) was observed in tumours with both 
CD79B and MYD88(1265P) mutations, but how these mutations 
cooperate to promote dependence on BCR signalling remains 
tunclear, Here we used genome-wide CRISPR-Cas9 screening 
and functional proteomics to determine the molecular basis of 
exceptional clinical responses to ibrutinib, We discovered a new 
‘mode of oncogenic BCR signalling in ibrutinib-responsive cell 
lines and biopsies, coordinated by a multiprotein supercomplex 
formed by MYD88, TLR9 and the BCR (hereafter termed the My- 
‘T-BCR supercomplex). The My-T-BCR supercomplex colocalizes 
with mTOR on endolysosomes, where it drives pro-survival NF-xB 
and mTOR signalling. Inhibitors of BCR and niTOR signalling 
cooperatively decreased the formation and function of the My- 
‘T-BCR supercomplex, providing mechanistic insight into thelr 
synergistic toxicity for My-T-BCR* DLBCL cells. My-T-BCR 
supercomplexes characterized ibrutinib-responsive malignancies 
and distinguished ibrutinib responders from non-responders 
Ourdata providea framework forthe rational design of oncogenic 
signaling inhibitors in molecularly defined subsets of DLBCL. 
‘Weusedalibrary ofsmall guide RNAS (sgRNAs) to conduct genome 

wide loss-of-function CRISPR-Cas9 screens for essential genes in 
Iyiphoid cll lines engineered with inducible Cas9, We screened three 


‘brutinib-sensitive ABC lines, one ibrutinib-insensitive ABC line, and 
four ibrutinib-insensitive germinal centre B cell-like (GCB) lines, as 
‘wel as two multiple myeloma and one'T cell lymphoma line as controls 
(Extended Data Fig. 1a, Supplementary Tables 1,2). For each gene, we 
derived a gene-Level statistic that we term a CRISPR screen score (C88), 
Which i, in essence, the numberof standard deviations away from the 
average eect of inactivating a gene (Supplementary Table 3, Methods). 

‘Non-targeting control sgRNAs were not toxic, whereas sgRNAS 
targeting pan-essential genest were depleted in all lines (Extended 
Data Fig. 1b). Among genes encoding B-cell transcription factors, 
wwe observed DLBCL subtype-specific dependencies in both GCB 
(MEF2B, TCF3, IRF8 and SPI1) and ABC (IRF4, SPIB and BATE 
cell lines (Extended Data Fig. 1c). Results from a validation screen 
using approximately 10 sgRNAs per gene (Supplementary Tables 4,5) 
were strongly correlated with those from the genome-wide screens 
(P<0.0001; Supplementary Table 6, Extended Data Fig 2) 

‘Most DLBCL lines depended on the BCR subunits CD79A and 
CD79B (Fig. 1), but engaged divergent downstream survival path 
ways. ABC lines uniquely relied on NF-sB regulators and on JAKI/ 
STATS signalling triggered by the NF-sB-dependent cytokine IL-1. 
By contrast, BCR signalling in GCB lines was NF-xB-independent, 
but shared a dependence on PI3K/mYOR signalling with ABC cells, 
albeit using diferent signalling adapters (PIK3AP1 in ABC, CD19 in 
GCB). The BCR signalling mode in GCB is similar to that observed 
inanother germinal-centre-derived malignancy, Burkitt lymphoma’, 
which was previously known as toni signalling because it resembled 
tonic, NF-B-independent BCR signalling in naive mouse B cells. 
However, GCB and Burkitt cell lines depended on both CD19 and LYN 
(Fig. 1, Extended Data Fig. 3), which ate not required for tonic signal 
ling in mouse B cells, so we instead term this phenomenon constitutive 
germinal centre BCR signalling 

‘The survival of BCR-dependent ABC lines relied on Toll-like receptor 
(TLR9), which coordinates MYD88 signalling in innate immune 
cells, and on two chaperones that regulate the subcellular localization 
of TLR9, CNPY3 and UNC93B1. TLR9 was the only essential TLR 
in ABC lines (Extended Data Fig. 2b). We validated these findings 
using time-dependent toxicity assays in 12 DLBCL lines transduced 
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with vectors that co-express sgRNAs and green fluorescent protein 
(GFP) (Fig. 2a). As expected, ABC lines expressing mutant isoforms 
(of MYD88 were sensitive to MYD88 deletion’. By contrat, TLR9, and 
its chaperones UNC93B1 and CNPY3!!, were only essential in ABC 
lines with MYD88"* and either a CD79A or CD79B mutation. These 
ddouble-mutant lines were also particularly sensitive to BTK inhibition 
(Supplementary Table 1) 

‘We next investigated copy number and gene expression levels of TLR 
pathway genes in 574 DLBCL tumours! ABC tumours had recurrent 
single copy gains or amplifications involving MYD88, TLR9, CNPY3 
and UNC93B1, all of which were more highly expressed in ABC 
tumours and their expression correlated with copy number (Fig. 2b, 
Extended Data Fig. a, Supplementary Table 7). Altogether, 49.7% of 
ABC tumours had increased copy number of one or more of these 


Fig. 1 | Genes essential for oncogenic 
signalling in lymphoma, Icons indicate 
‘essential genes from CRISPR screens coloured 
by the average C8S in GCB-dependent (orange) 
for BCR-dependent (bhie) ABC DLBCL lines. 


genes (Fig. 2b, Supplementary Table 8), with CNPY3 and UNC93B1 
demonstzating minimal common amplified regions of 1.1 Mb and 
27k respectively (Extended Data Fig. 4b, Supplementary Table 9) 
‘These data provide genetic evidence thatthe TLR pathway contributes, 
tothe ABC phenotype 

“To determine TLRS function in ABC DLBCL, we expressed a fusion 
protein linking TLR9 to BiolD2, a promiscuous iota ligase that biot- 
lates proteins within approximately 101m". Biotinylated proteins in 
TLR9-BiolD2-expressing ABC cells were pusified and compared to 
proteins from conteo cells by SILAC (stable isotope labelling by amino 
acid in cell cultur)-based quantitative mass spectrometry. To define 
the TLR9 interactome that is essential in ABC DLBCL, we compared 
the enrichment of each protein quantified by mass spectrometry with 
its respective CSS metric (Fig. 2c). The TLR9-essentialinteractome 


Fig. 2| TLR9 couples BCR 
signalling and mutant MYDS, 

a, Toxicity of sgRNAsin DLBCL 
lines normalized to day 0. WE 

wild type. b, Copy nunaber gain or 
amplification of indicated genes 
in ABCbiopsies.¢ TLR9-BiolD 
interactome in HBLI cells versus 
(CSS. Bait (TLRS) is labelled in bloe 
essential interactors are labelled red 
essential interacts also in TMDS 
calls are labeled darkered TLRS 
o-immunoprecipitates with IgM. 
in ABClines (HBLI, TMD8 and 
OCL-Ly10). OCI-Ly19 isa GCB line 
‘ f.Left, confocal images of PLAS 
(ted) showing the lgMTLRS (e) 
or TLR9-MYDS8 (0 interaction in 
HELI cell. Cells were counterstained 
With DAPI (blue) and wheat germ 
agglutinin (WGA: green)- Right 
PLA scores after knockdown of 
Indicated genes. ***P «0.001; 
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confirmed the association of TLR9 with MYD88 and CNPY3, but 
also revealed interactions with the BCR subunits CD79A and CD79B 
(Fig. 2 Extended Data Fig. 4c-e, Supplementary Tables 10, 11). The 
1gM component ofthe endogenous BCR co-immunoprecipitated with 
TLRO in three ABC lines more than ina GCB line (Fig. 2). By contrast, 
nether TLR4 nor TLR7 co-immunoprecipitated with IgM (Extended 
Data Fig. 5a) TLR9 associated with IgM in an intracellular fraction of 
ABC cells rather than a plasma membrane fraction (Extended Data 
Fig 5b), suggesting thatthe BCR and TLRS might cooperate at an 
intracellular location 

To visualize where TLR and the BCR interact, we used proximity 
ligation assays (PLA), which identi proteins within tens of nanometres 
of each other", An IgM-TLR9 PLA produced fluorescent punctain 
the cytoplasm of ABC cll that was reduced by depletion of CD79A 
or TLRS (Fig. 2e, Extended Data Fig 5c). The IgM-TLR9 PLA signal 
was present across a panel of BCR dependent ABC lines, with higher 
signal in double-mnutant lines, whereas BCR-independent ABC and 
GC lines had substantially lower signals (Extended Data Fig. 54-1) 
1gG-TLR9 PLA gave no detectable signal (Extended Data Fig. 58) 
1gM-TLR9 PLA signals co-localized with the endolysosomal marker 
LAMPI (Extended Data Fig. 5h, i), consistent with the dependence of 
these ABC lines on UNC9B1 and CNPYS, which facilitate TLR9 entry 
into LAMPI~ endolysosomes" Ectopic expression of TLRS, wild-type 
MYD88 or the MYD88(L265P) mutant increased the gM-TLR9 PLA 
signal (Extended Data Fig. 5), suggesting that TLR9/MYD88 copy 
number gains in ABC tumours could augment BCR-TLR9 cooperation. 

‘Knockdown of TLR9 decreased NF-xB-dependent gene express 
and reduced lB kinase activity in ABC ines with MYD8S(L265P), 
Confirming the role of TLR9 in oncogenic NF-xB signalling (Extended 
Data Fig 6). TLR9-MYD88 PLA puncta were viblein the cytoplasm 
of ABClines, but were diminished by the knockdown of TLR9, MYDS 
or CD79A, suggesting thatthe BCR facilitates recruitment of MYD8B 
‘o TLR9 (Fig. 20, 

These results suggest that TLR9 coordinates signalling between 
the BCR and MYD88. We proposed tht the BCR, TLRS and MYDSB 
nucleate signallosome that activates NF-cB, which we will term the 
MyD88~TTLR9-BCR (My-T-BCR) supercomplex. To identify ad 
tional components ofthe My-T-BCR supercomplex, we expressed a 
MYDS8(1265P)-BiolD2 protein in three ABC lines and performed 
mass spectrometry analysis of MYD88-proxima biotinylated proteins. 
We identified proteins biotinylated in all three lines and used thei 

CSS scores to define the essential MYD88 intractome, which included 
the BCR (CD79B), mTOR, PLC~2 and the CARDI1-BCLIO-MALT] 
(CBM) complex (Fig 3a, Extended Data Fig. 7a, b, Supplementary 
Tables 12-14), Steptavidin pulldown and immounoblot analysis 
confirmed CARDII and MALTI biotinylation in ABC cells with 
MYD88(L265P) -BiolD2 (Extended Data Fig, 76d), 

Finding the CBM complex in proximity to MYDSB was unexpected 
since these adaptors are thought to independently promote NF-<B 
activation. Both MALTI-MYD88 and BCL10-MYD88 PLAs yielded 
robust cytoplasmic puncta in ABC cells, confirming the association 
of endogenous MYD88 with the CBM complex (Fig 3b, Extended 
Data Fig 7, ). These PLA signals were reduced by knockdown of 

:D79A, TLR9 and CARDII, suggesting that BCR and TLRS signal: 
ing cooperate to assemble MYD88 and the CBM into asupercomplex. 
Moreover, CARDI1-BCLIO PLA puncta were reduced by the knock 
down of TLR9 or MYDS in double-mutant cll lines, demonstrating 
that TLR9 signalling controls CBM complex assembly in ABC cells 
(Fig. 3, Extended Data Fig 7) 

[Nf-sB is activated by kB kinase (IKK)-dependent phosphoryl 
ation of IsBa. By PLA, both IgM and TLR9 associated with phos 
phorylated tsa (p-IsBa) in the eytoplasm of ABC cell, which was 
feduced by knockdown of CD79A, TLR9 or MYD88 (Fig. 34, ¢) 
‘Thus, NF-KB activation is closely associated with the My-T-BCR 
supercomplex. 

‘We next visualize the subcellular location ofthe My-T-BCR super: 
complex by staining ABC cells bearing MYD88(L265P)-BioID2 with 
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| The My-1-BCR supercomplex coordinates NE-nB activation. 
a, MYDS8(L265P)-BiolD interactome in TMDS cells versus CSS. Bait 
(MYD8s L265P) i labelled in blue; essential interactors ate labelled 

red; essential interacts in at least two ABC lines are labelled dark re. 
be, Left, confocal image of PLAS (red) showing interaction of MALT 
MYDS (b), CARDII-BCL10 (6), IgM-p-IkBos() and TLR 

(c). Cells were counterstained with DAPI (blue) and WGA (green). Right, 
LA scotes in HBL! cells after shRNA knockdown ofthe indicated genes. 
**P'< 0,001; see Methods. f, Confocal images of MYDS8(L265P) -BiolD- 
transduced HBLL or TMDS cells stained as indicated, Scale bar, 10 ya. 


fluorescently labeled steptavidin, The MYD88-BiolD? signal defined 
large cytoplasmic structures that co-localized with p-IKK, consistent 
with active NF-KB signalling at these sites (Fig. 3f, Supplementary 
Video 1) These complexes extended into the cytoplasmic space from 
the surface of LAMPI” vesicles. BCR was visualized by cell-surface 
labelling of IgM witha fuorescent Fab fragment on ice, followed by 
brief warming to allow internalization, The LAMP1" vesicles with 
MYDS8-BiolD2 signals also contained IgM, suggesting a dynamic 
shutiling ofthe BCR from the plasma membrane to the intracellular 
site of My-T-BCR supercomplex formation. 

Given thatthe My-T-BCR supercomplex coordinates pro-survival 
signalling in ABC DLBCL, we hypothesized that inhibition of BK 
activity by ibeutinib might disrupt ths signalling complex. Ibrutinib 
reduced puncta of the My-I-BCR supercomplex in ABC lines bearing 
MYD88(1.265P)-BiolD2 (Extended Data Fig. 7h. To globally assess 
the effect of ibrutinib on the My-T-BCR supercomplex, we treated 
two ABC lines bearing MYD88(L265P)-BiolD2 with ibrutinib and 
analysed the biotinylated proteins by mass spectrometry. Interactions 
of MYD88 with the CBM complex (CARDI1), PLC-2 and mTOR 
were disrupted by ibrutinib (Fig. 4a, Extended Data Fig, 7i and 
Supplementary Tables 14 15). 

‘The ibrutinib-senstive association of mTOR with MYDSS suggested 
that signalling by the My-T-BCR supercomplex might affect pathways 
controlled by mfOR. Of note, components ofthe Ragulator complex 
(LAMTORI, LAMTOR3, LAMTOR4 and RRAGA), which regulates 
mTORC! activity atthe lysosomal membrane! were biotinylated 
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by TLR9-BiolD2, as were components of the lysosomal V-ATPase 
(ATP6V1B2 and ATP6VOD1), which regulates the miTORCI response 
to amino acids'® (Fig. 2c, Extended Data Fig. 4c, Supplementary 
Tables 10, 11). In ABC lines with MYD88(1265P)-BiolD2, mTOR 
localized to LAMPI* vesicles, often in proximity to the My-T-BCR 
supercomplex (Fig 4b). PLA in three ABClines confirmed that ibruti 

nib decreased association of endogenous MYD88 with mTOR, MALTI 
and BCL10 (Fig. 4). Ibrutinib also decreased the association of IgM 


Fig. 4| mTOR isan essential component 
of the My-T-BCR supercomplex. 

a, MYD88(L265P)-BiolDinteractomein 
GCI-Ly10 cells treated with sbrutnib (10M) 
‘or DMSO. Ibrutinib-senstive interactions 

are labelled in red. Bat (MYDS8(L265P)) is 
labelled in blue. b, Confocal images of mTOR 
(green), LAMPI (red) and MYD88(1.265P)- 
BiolD2 (cyan, streptavidin) in ABC cells. Scale 
bar, 1 jm. c, PLA scores for indicated protein 
interactionsin ABC lines treated with ibrutinib 
for DMSO. d, Normalized MYD88(.265P)- 
BiolD intensity per cell in ABC lines treated as 
indicated for 24h e, Immunoblot: using the 
indicated antibodies of ABC lines treated with 
indicated drugs for 24h, f, Synergistic toxicity 
scores in TMDS cells treated with beatin 

‘or acalabrutinib together with th indicated 
‘drugs. g, Growth of IMD xenogeafts in NSG 
‘mice treated as indicated. NS, not significant 
SP= 1105, **P < 0.01, «4 < 0.0 

see Methods. Double hash symbols ( 
mouse death 


and p-IkBa, but had mixed effects on IgM association with TLR9. 
‘These findings suggest that IgM trafficking to TLRS endolysosomes 
is constitutive, but interaction of the My-T-BCR supercomplex with 
mifOR, the CBM complex, and NF-sB is contolled by BTK-dependent 
BCR signalling 

Given the proximity of MYD88 and mTOR, 
effect of mIOR inhibition on the My-T-B 
MYD88(.265P)-BiolD2-expressing AB\ 


we investigated the 
supercomplex, In 
cells, formation of the 


Fig. | The My-T-BCR supercomplex identifies ibrutinib-responsive 
lymphomas, a b,lgh-TLR9 PLA puncta per cell in GCB and ABC biopsies 
(@) or indicated lymphoma biopsies (normalized to TMDS signal = 100) (). 
CCLL, chronic lymphocytic leukaemia; LL, lymphoplasmacytic lymphoma; 
MCL, mantle cell ymphoma: PCNSL, primary central nervous system 
lymphoma; WM, Waldenstram macroglobulinaemsa. PLA data normalized 
to TMDS control e, Representative lgM-TLRO PLA images of AMC biopsies, 


Top, bright-fild images; bottom, fluorescence images. Scale bar, 10 ym. 

D, progressive disease, PR, partial response. d, lgM-TLR9 PLA of biopsies 

from DLBCL patients treated with ibrutinib. Red denotes responders 

(red; complete response (CR)/PR/stable disease (SD)); grey denotes 

non-responders (PD)-e, Modes of My-T-BCR supercomplex signalling and 

constitutive germinal centre (GC) BCR signalling. *P < 0.05, **P < 001 
P< 0.001; see Methods. 


‘My-T-BCR supercomplex was reduced by ibrutinib, but was further 
attenuated by the addition of AZD2014, an mT ORC1/2 inhibitor 
(Fig. 4d), Dual mTOR and BTK inhibition cooperatively decreased 
MYDS8 protein levels and blocked mTOR activity, as assessed by 
p-4E-BPI and p-S6 kinase, as well as NF-KB activation, as assessed 
by p-IKK (Fig. 4e) These data provide mechanistic insights into the 
synergism between BTK inhibitors and drugs targeting mTOR or PI3K 
in ABC models growing in vitro (Fig. 4) and in vivo!™"* (Fig. 4g), 

Finally, we examined whether the My-T-BCR supercomples is detect 
able in primary lymphoma biopsy samples, and ifits presence might 
be associated with ibrutinib responsiveness. We optimized the PLA 
for use in formalin-fixed biopsy samples using tissue microarray of| 
81 lymphoma cel lines lgM~TLR9 PLA signals were highest in ABC 
lines with chronic active BCR signalling, with litle ifany PLA signal 
in other lymphoma lines or normal B cells present in tonsils or reac- 
tive lymph nodes (Extended Data Fig 8 b Supplementary Table 17). 
Among DLBCL biopsies, AIBC cases had significantly more IgM~TLR9 
puncta than GCB cases (Fig 5). High IgM-TLR9 PLA signals were also 
observed inthe biopsies of primary central nervous system lymphoma, 
Waldenstrm macroglobulinaemia, and its relative, ymphoplasmacytic 
lymphoma (Fig 5b). These malignancies commonly have MYD88™= 
and/or CD79A or CD79B mutations, and respond frequently to ibruti- 
sib Of wo Waldenstrim macroglobulinaemia lines tested, one had 
My-T-BCR supercomplexes, and knockdown of the BCR (CD79A) or 
‘TLRS was selectively toxic fr this line (Extended Data Fig. 92-c). My 
BCR supercomplexes were not evident in mantle cell lymphoma or 
chroniclymphocytic leukaemia samples (Fig, Sb), suggesting that these 
‘malignancies rly on a qualitatively distinct form of BCR signalling 

‘We next examined eight available biopsies from patients with 
relapsed or refractory DLBCL enrolled on a clinical trial of ibrutinib 
smonotherapy!, We adapted the IgM-TLR9 PLA to allow immunohisto 
chemical identification of CD20" lymphoma cells (Fig. 5c). Three ABC 
cases and one unclassified DLBCL scored positive in the IgM-TLR9 
PLA while three other ABC cases and one GCB case were negative 
(see Methods; Supplementary Table 16) The percentage of IgM-TLR9 
PLA-positive malignant cells was significantly higher (P=0.0286) in 
tumours that responded to ibrutinib than in those that progressed on 
treatment (Fig. 5). In this series, two responding cases with IgM— 
‘TLR9 puncta had CD79B or CD79A mutations, respectively, but lacked. 
MYD88(L265P), while two other responders were wild-type for these 
genes (Supplementary Table 16). These findings demonstrate that 
the My-T-BCR supercomplex exists in ABC DLBCL tumours that 
respond to ibrutini, even in those lacking the MYD88™""CD79A or 
“MYD88'**"CD79B double-mutant genotype. 

We provide genetic, proteomic, cell biological and functional 
evidence fora pro-survival signalling hub—termed the My-T-BCR 
supercomplex—that coordinates NF-xB activation in DLBCL and 
identifies tumours that respond to therapeutic inhibition of NFB 
by ibrutinib. This supercomplex is present in a subset of ABC DLBCL 
lines and tumours, but is generally absent from GCB DLBCL, which 
have an alternative constitutive germinal centre’ BCR signalling mode, 
requiring distinct therapeutic strategies (Fig. Se). The My-T-BCR 
supercomplex provides mechanistic insight into the efficacy of drug 
combinations in ABC DLBCL and may aid inthe development of pre- 
dictive assays to identify patients who would benefit from drugs that 
target BCR-dependent NF-KB activation, including BTK inhibitors. 
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Any Methods, including any statements of data availabilty and Nature Research 
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METHODS 
Cell culture. Cell ines were grown at 37° in the presence of 5% COs and 
‘maintained in RPMI supplemented with etal bovine serum (Tet tested, Alanta 
Biologics) and 1% peniilin/streptomycin and 1% 1-glutamine (Invitrogen), 
except for OCL-Ly10 and OCI-Ly3 which were grown in IMDM supplemented 
‘ith 20% heparinized human plasma, 1% penicillin/treptomycin and S5y.M 
{-mercaptoethanol. ll cell lines were regulary tested for mycoplasma using the 
‘Mycodlert Mycoplasma Detection Kit (Lonza) and DNA fingerprinted by exam- 
ining 16 regions of copy umber variants (Jonathan Keats, personal commu 
cation). OCI-Ly3, although present inthe database of commonly misidentified 
‘all lines maintained by ICLAC, was include in thie study as a necessary model 
of aBCR independent, MYDSS™*" mutant, MYDS8-dependent ARC DLBCL. 
‘Thscell line was authenticated by DNA fingerprinting and compared to historical 
DNA contrle. 

(Caso vector construction, pRetroCMV/TO-Cas9-tygeo ws crested by ligating 
the tetracycline inducible CMY promoter irom peDNAA/TO (Invitrogen) with 
‘MIeL/Xhat and blunt cloned into the Xhol/EcoRI digested pRetrosuper vector 
‘The puransycin-esstance gene from pRettOCMV/TO was removed with Stl) 
(Cla and replaced with PGK hygromycin, which was isolated from pMSCV Hygro 
(Clopetech) with Agel/HindIT and similarly cloned into pRetroCMV/TO. Cas) 
was isolated from the LentiCrispr v2 (Addgene 52961) and blunt cloned into 
pRetroCMV/TO-hygro digested with Agel/BamHL pCW-Cas9-Blasticidin was 
{enerated from pCW-Cas9-paro which was purchased fram Addgene (S066) 
fnd digested with Ramil and Xbal to remove the puromycin resistance gene. 
‘Ag.block (DT) containing the blasticidin resistance gene was Gibson loned into 
the cut yetor with 12-bate-pir overlaps, 

Caso clone generation, Cell ines were transduced several times with ether 
TO-Cas9-hygro or pCW.-Cas, selected and dilution cloned, Single-cell clones 
‘were picked and teste fr functional Cas9cuting after transduction with sgRNAe 
that target surface markers including CD20 or CAMI, Clones were selected based 
‘on loss of surface expression within the transduced population a measured by 
ACS 4-14 days ater the addition af doxycpline. 

sgRNA vector and cloning, The pLKO-based sgNA vector was purchased 
‘Addgene (52628). The puromycin gene was removed and replaced with a puro- 
GFP fusion protein previously described” using Gibson assembly. The resulting 
plasmid was digested with BluAL and incubated with shrimp alkaline phosphatase 
before isolating the backbone. Complementary sgRNA sequences flanked by 
ACCG on the5'end, and CTTT on the 3 ofthe reverse strand, were annealed, 
diluted and ligated into the cut vector with T ligase according tothe manufac” 
turer’ instructions. All transformations were performed in Sth bacteria and 
grownat 30°C, 

RNA library construction. The genome-wide Brunello sgRNA library was 
purchased from Addgene and transformed in Sthld bacteria from Invitrogen 
‘The Brunello library contains 7441 sgRNAS targeting our unique positions in 
‘mast (19,114) protein-coding genes, along wth I 000 negative control sgRNAs 
Sequences for the follow up library of 12,472 sgRNAs were chosen from 
published sgNA lib ‘or were designed using the online tool at htp// 
{rispemitedu. The ibrary(CustomAreay Ine.) of74-mer ofthe RNA sequence 
prepended withthe oligonucleotide sequence GGAAAGGACGAAACACCG 
and followed by GITTTAGAGCTAGAAATAGCAAGTTAAAATAAGGC. 
‘The oligonucleotide brary was PCR amplified with Herculase Il Fusion DNA 
polymerase (Agilent using ArrayF and Array”. The subsequent PCR prod- 
{ct was gel extracted using an eGel from Invitrogen and 20ng of library was 
Gibson cloned into BfuAL-cut sgRNA vector following the manufacturer’ 
Instructions. Transformations were grown at 30°C overnight on 24.5cmx? 
bioassay plates maintaining a least 100% coverage. Colonies were scrapped 
spun and DNA was isolated with Blood and Cell Culture DNA Maxi kits 
(Qiagen) 

‘Virus production and transduction, Lentiyirases were produced in 293FT cells 
by transfecting sgRNA vectors with packaging vectors pPAX2 (Addgene 12260) 
and pMD2.g (Addgene 12258) in 43:1 ratio in serum tree Opti-MEM. Trane 
293T (Mirus) was added and incubated fr 15min before adding dropwise o cells. 
Supernatants were harvested 24, 48 and 72h later, spun at 1,000 o pellet any 
virus producing cellsand then incubated with Lenti-X concentrator (CloneTech). 
‘Virus was concentrated according to manufacturer instructions, aliquoted and 
frozen. Virus tiation was pesformed on target cll populations and GFP was 
measured 3-4 day ater: When GFP was not presenti the backbone ofthe sgRNA 
plasmids, transduced cells were spit and incubsted with or without puromycin 
‘anti untransduced control cells were dead. The percentage of viable call wa then 
‘measured by FACS and percent transduction was calculated asthe rato of viable 
cellsin treated versus untreated wells 

Pooled sgRNA screening. For both genome-wide and targeted fllow-up screens, 
individual replicates were transduced such that an average of 500 copies of each 
sgRNA was present after selection, Cultures were caried fr the duration ofthe 


21-day frsen maintaining S00 caverage. Anubiotic selection was started 3-4 
‘aysafter transduction and carried out until untransdced control els were ded, 
spproximately 4-5 day later. Cells were then harvested fora daytime point and 
‘doxycycline was added othe culture media st 200 ng ml final concentration, 
‘Transduced cells were counted and passaged every twa days with fresh media 
containing doxycycline until day 21 when cells were again collected for DNA 
extraction, DNA wasisolated from frozen cell pellet sing Qiagen QUAmp DNA 
Blood Midi and Maxi kis 
Library amplification, sequence extraction and PCR primers. Fr both screens, 
sgRNA sequences were amplified using a nested PCR to fist isolate the agNA 
sequence from genomic DNA and then to add nestgen sequencing adapters 
‘compatible with Illumina’ NexiSeqS00, Products were amplified using ExTaq 
(Takara) for 18 cycles in both rounds of amplification. Products were size selected 
‘using an eGel (Invitrogen) and libraries were quantitated using an lumina 
specie Kapa quantification kit accarding tothe manufictuer instructions orby 
{Quhit (Thermo Fisher Scientific) Al libraries were sequenced using a high output 
single-ead 75 cycle read flow cell An average of 400° (200-700) sequencing 
‘pth was achieved, Libraries were multiplexed using indexes compatible withthe 
‘lumina TruSeq HT kt with the primers blown which‘ denotes an 8base-pair 
Index andy representsa variable length adaptor inserted o prevent monotem- 
plate. tn total, forward primers and 12 reverse primers were used following 
this format, such that 96 samples could be multiplexed. BaseSpace sequence 
tau https:/wiwwebasespace lumina com/bhome/index) was used to evaluate 
sequencing quality measures andto demuliplex sequencing reads Sequences were 
aligned tothe agRNAs library allowing fora one basepair mismatch using custom 
‘tips and Bovitie2 version 22.9 withthe following parameters -p16--locl-k 
10-very sensitive local -L9-N 1 
CCRISPR screen primers. Fist PCR forward primer: AATGGACTATCATATG 
(CTTACCGTAACTIGAAAGTATTTCGftst PCR reverse primer: GTAATT 
(CTTTAGTTIGTATGTCTGTTGCTATTATG; second PCR forward primer: 
AATGATACGGCGACCACCGAGATCTACACSACACTCITICCCTACACGA 
"TCT TCOGATCTyTCTTGTGGAAAGGACGAAACACCG; second 
PCR reverse primer: CAAGCAGAAGACGGCATACGAGATSGTGACTG 
GAGTTCAGACGTGTGCTCTTCCGATCRtactatcrtccccgeactgt. PCR ampli 
fication for sgRNA library construction: ARRAY-F: TAACTTGAAAGTATT 
‘TOGATTICTIGGCTTTATATATCTTGIGGAAAGGACGAAACACCG: AR 
RAY. ACTTTTTCAAGTTGATAACGGACTAGOCTIATTTTAACTTGCTAT 
‘TICTAGCTCTAAAAC. 
Pooled sgRNA screen analysis, CSS were calculated using the falling formulas 
‘Step 1. Normalize rave counts by ttl read counts 


Nig = 1+ Oty 10) Xu 


Step 2 Elisinate sgRNA with lve counts acros all experimental conditions 
bycalulating 


and eliminating those gNAS for which m< 100, 
‘Step 3. Calculate log satis (LR) 


1065 (N Ne 
Step 4 Z-teansform logratios 
2,y= (LR -mean(LR,))/standard deviation(LR,) 
Step 5 Average across epics 
S,=mean(Z,) 
Sep 5. Calcslatesignal variance (SV) foreach sgRNA. sing ttl variance (TV) 
aero variance (EV) 


‘Ty=varis,) 


EN 


smean(varlZq)/) 


sy, 


Ty EY, 


Step 6. or each gene get Ge the set of sgRNAs that represen it and calculate 
the maximal pal wise corelation hetween any to sgRNAsin thie set 


G,= max (cortelation(5,.5))) 


fora given gene C, <0.45 then let i be the index ofthe sgRNA representing 
that gene that has the highest signal variance and use that asthe sole epresentaive 
of gene jScore, —8,, 

TG, >0.45 then Proceed to steps 7 and 

‘Stop 7. Average the to sgRNAs that were most correlated within gene. 


in which KEG are such that 
Corrlation(5i5,) =G 

Step 8. For each ieG,calulate 
y 


orrelation(S,.M,) /C, 


Average together those sgRNAS for which V, > 0.85 to arrveat final C3S for 
weneg 


Sy= mean (S,) 
In which denotes [1-77.44] index indicating gRNA; denotes [1-11] index 


indicating cll line: denotes the number of eplcates for cel liner denotes 
the [1-R index indicating replicate; denotes the (0,21) index indicating time 
point Ko indicates the raw sequencing counts fr sgftNA iin replicate ro cell 
Hine at time a 

For the replication sgRNA library, in which most genes had 9-10 sgRNAs per 
gene, we found this could be simplified by using the z-scores ofthe averaged 
Tog, fold change of all sgRNAs per gene. As described above, using the best 
correlated sgRNAs per gene excluded many poor performing sgRNAs. it also 
‘excluded many high performing sgNAs that shared expected subtype specific 
Statistical significance, The statistical significance in Fig. of the CSS of ABC 
[BCR dependent, or GCB DLACL, compared toall ther cll line, was calculated 
with ato sed random variance t-test fr individual genes. Sreen correlations 
‘were calculated using a Pearson corelation on gene-level metrics in GraphPad 
Prism 7.0 software on genes displayed in Extended Data Fig, 2 
FACS analysis, Cell ines were transduced with sgRNA vectors marked by GFP 
“Three to four day afer tranction, GFP levels were measured by flaw etonsetry 
2 BD FACS Calibur using CellQuest Pro version 60 and analysed with lowlo 
‘ersion 9, Cells were split every other day into doxycycline containing media and 
CGEP levels were followed foe 14 days and normalized tothe day 0 messurement. 
AllsgRNA and shRNA sequences ae listed below. When srface proteins were 
targeted knockout was validated by Now cytometry byspinning cll down, wash 
Ingin FACS buffer (PBS plus 2% (/¥) FBS, IM EDTA), and stained at °C for 
Si0min in FACS butler with Mlurescently labelled antibodies: mouse anti-human 
(CD1V-APC (Biolegend S}25Cl, 1-500), mouse anti-human CDSI-PE (Bolegend 
'5A6, 1500), mouse anti-human IgM-APC (MHM-88), 1100} or fos Southern 
Biotech: goat anti-human IgG-PE (1200), 
Drug sensitivity assays. DLBCI cell lines were enumerated and 10,000 clls were 
‘seeded in triplicate ina 96-well pat in fresh media. Ibrutni (Selleck Chem) 
was dissolved in DMSO and equal volumes of ited drug were added to cellsto 
reach the indicated final concentration Celle were cultured with drugs, which were 
replenished ater 48h. Metabolic activity was measured at day by adding 10 jf 
[MT reagent (Promega) and incubatingat 37°C for 4h. Absorbance was measured 
2 490/nm using 2 96-well Tecan Infinite 200 Pro plate reader. Absorbance values 
from medi-only well were subtracted and data were normalized to DMSO contol 
unless otherwise stated, GR5O was calculated using the anne tool GRealeulator 
(http /wewegrealculatororg/)". Drug matrix screens and ABlis calculations were 
performed as previously described”, 
Gene expression profiling and signature enrichment. Cells were transduced 
with shRNA, puro selected and collected a indiested times after shRNA indo: 
ton. RNA was isolated using RNEasy mini kits (Qiagen), Gene expression was 
assessed using two-colour human Agilent 414K gene expression arrays following 
the manufacturers protocol. In brief, contro shSCA (control. Cy3-Lablled) RNA 
was compared to RNA from clls with shRNAs targeting TLRS (C4), TLR9 (D7), 
[MyD88 (A7) or MyD88 (83) (CyS-labelled) at each ofthe indicated ime points. 
Array elements were filtered for spt quality using Agilent Feature Extraction sft 
ware version 107, specific genes were determined tobe downregulated ifthe logs 
fold change comparing contol SC to shRNA for TLRS) wasless than -0.3 for 
teas thre ofthe four time points (siTLR9) per cell line Signature enrichment 
eas prfocmed as previeusly described" In brit, downregulated genes were tested 
for overlap with published gene signatures in a2 2 contingency table using a 
Fishers exact test 
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DNA copy number analysis. DLBCL DNA samples were analysed with the 
Alfymetrix SNP6.0 array. Probe log ratios were calculated using Affymetrix 
Genotyping Console, and were collected into segments of similar value using 
<ircular binary segmentation (htps//biocanductororg/packagestelease/bioc! 
himal/DNAcopy him). These segments were assigned capy number values as 
previously described without segment length restrictions. DNA copy number 
‘was corteatd to sample gene expression using linear regressions calculated with 
Graphpad Prism version 7. Amplified regions were identified and visualized on 
the UCSC genome browser, gL, 
[NF reporters, The generation ofthe Iso luciferase reporter celine hasbeen 
previous described In biel, the TMDS-liBc cll ine was transduced with nd 
‘ated shRNA, puro-selected and induced with doxycycline. Cell were collected 
athe indicated time points and lacferase was measured with the Dual Luciferase 
Reporter Assy System (Promega) on a Tecan Infinite 200 Pro plate reader. 
[gM co-immunoprecipitation, HBL, TMD8, OCI-Ly10and OCELy19 eels were 
Iysedat 10” els perm in a modified RIPA buffer (0.5% Teton X-100, 0.25% deox- 
Yeholate, 0.025% SDS, LOmM Tis, pH 80, 100mM NaCl, 10mM EDTA, Lm 
NaNO, 30mM pyrophosphate, 10mMghcerophosphate, mM AEBSE 002U ml! 
sprotinin and 0.01% NaN,) for 10min on ice Lysates were cleared by centrifu- 
{ition a 14,000 for 20 min at 4°C- IgM was immunoprecipitated by incubating 
Iysates on ce for 1h with 10g of biotn-laelled gost anti-human IgM (Jackson 
[Immunoresearch) followed by the addition of 35) of pre-washed streptavidin- 
agarose beads (Invitrogen) and rotated for 30min at 4°C. Beads were washed 
three times with cold 1 RIPA buffer, then solubilized by adding 2 LDS sample 
butler (nvtcogen) with 1% -mnercaptoethanol and bold for mn. Samples were 
separated on a 10% polyacrylamide gl and transferred to Immobilon-p PVDE 
‘membrane (Millipore) for western blot analysis. Membranes were probed with 
rabbit ant-TLR9 monoclonal XP rabbit anti-TLR7 (Cel Signaling Technologies), 
rabbit anti-TLR4 (Santa Cruz Biotechnology) and goat ant-IgM-HRP (Bethyl) 
PLA. DLBCL cell lines were left untreated, eated with 10 nM ibrutinih, 200 nM 
AZD2014 or equivalent volumes of DMSO, or transduced with control shRNA 
(SCA) or shRNAs targeting CD79A, TLR9, MYDSS, CARDII, BCLIO or MALTI, 
followed by puromycin (avitrogen) selection as previously described" Cells 
were plated ant 15 well Slide Angiogenesieibireat chamber slide (bid and 
Allowed to adbere tothe surface for 3Omin at 37°C. Cells were then fixed with 
{4% paraformaldehyde (Electron Microscopy Sciences) for 20min t room tem- 
peratureand then washed in PES (Invitrogen). Cellular membranes were labelled 
With 5; ml-! WGA conjugated to Alexa Fluor 488 (Thermo Fisher Scientific) 
for 10min a oom temperature. Cells were permeabilized in cold methanol for 
min, washed in PBS and thea blocked in Duolink Blocking butler (Sigma) for 
0min a om temperature, Primary antibodies were diuted in Duolink Antibody 
Dilueat (Sigma) and incubated overnight at 4°C (see Supplementary Table 8). 
‘Where appropriate. calls were counterstained with mouse ant-LAMPI conjugated 
to Alesa Flor 405 Santa Cruz Biotechnology) with the primary antibodies The 
ext morning cells were washed far 20min nalarge volume of PBS with 1% BSA, 
followed by addition ofthe appropriate Duolink secondary antibodies (Sigma). 
diluted and mixed according to the manufacturers instructions. Cels were incu. 
hated for That 37°C, aftr which clls were washed in TBST with 05% Tween-20 
for 10min. Ligation and amplification steps ofthe PLA were performed wsing the 
Duolink in stu Detection Reagents Orange kit (Sigma) according to the manu- 
facturers instructions Following the PLA, cells were mounted in Prolong Gold 
‘mounting media with DAPI (Invitrogen). Images were acquired on a Zeiss LSM 
80 Confocal microscope using Zeiss Zen Black version 23 Images for display 
and Pearson’ correlation coeflicients values were calculated with NIH Imagel/ 
FIL software version 2.0-e-65/.Se™ PLA spots were counted in cellines using 
Blobfinder version 3.2*. PLA scores were determined by normalizing the number 
ff PLA spots counted in each sample tothe average number of PLA spots counted 
inthe control sample, which was set to 100. Box and whisker plots display the 
median PLA score with whiskers incorporating 10-90% ofall dat, ours are 
displayed as dots 

‘The PLA was performed on formalin-fixed, parafin-embedded (FEPE) tissue 
‘microarrays or biopsy samples ina similar manner. FFPE microarrays (7j) and 
Patient tissue sections (44m) were deparaffinized in xylene and rehydeated in 
‘graded alcobol and dsilld water. Heat induced antigen retrieval was performed 
fn tissue microarrays and tissue sections at pl¥ 60 for 30min, Sides were then 
placed in Tris-buffered solution and prepared for proximity ligation assay, as 
described above samples were costained with mouse anti-human CD20-¢luor 
tr AlexaFluoris8 (126, eRiesclence) Data were analysed using Mubfinde version 
3.2. Cells with 10 or more IgM-TLR9 puncta in their nucleus were removed fom 
analysis to control for increased autoflurescencein FEPE samples, Tissue micro- 
aurays were prepared by fixing cells in neutral bufer formalin for 24h, plleting 
and resuspending in an equal volume of low-mel agarose ina 10m conieal ube 
‘The resulting pellet was paralfin embedded by standard protocol". The resultant 
blocks were used to construct cell lin array (CMA) using the same approach 
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used fr construction of tissue microarray, with 100mm ness using a Beecher 
‘MTA-1 instrument (Beecher Instruments). Sample identifiers were removed and 
blinded before pathology review for PLA signal Afterall data were collected, san- 
ple identifier were revealed and samples were grouped by response to ibrutinb. 
‘BloID2 contructs. BioID2 (Addgene #0399) was appended tothe C terminus 
of TLR9 and MYDS8(L265P) using Cibson cloning techniques. MYDS8(L263P)- 
13X-BioID2 was cloned by removing GFP from the previously described pIMIN- 
‘MYD&8(L265P)-VD-GFP® by restriction digest with Stal and Notl. BiolD2 
‘Was PCR amplified with a 13CN-terminal linker and Gibson cloned a above fom 
‘Acddgene 40499 with the following primer: MYD88.Cterm/13X: CTGGACTCGC 
CTTGCCAAGGCCTTGTCCCTECCCUGTGGAGGCGGGTCTEGAGGC, 
[PRMN- Nott-BioID2-Cterm: CCTCTAGIGCGGCCGCTTATGCGTAATC 
CGGTACATC. BioID2 was also appended to the C terminus af both wild-type 
sind mutant isoforms of MYDSS with a two-aming acd linker (VD). First 
Bio1D2 and MYDS were PCR amplified with Primestar (Takara) using the 
following primers BoID FWD; TTGTCCCTGCCCGTCGACTTCAAGAAC 
(CTGATCTGGCTG: BiolD REV: CGCCGGCCCTCGAGGCTATGCGTAATC 
(GGGTACATCG. MYDS FWD; AATTCGAATTCCTGAAGGGCCACCATGC 
GACECGACCGOGC: MYDSS REV: AGATCAGGTICTIGAAGTCGAGGGGCA 
GGGACAAGGC. TLRS wascloned intoa moded version of RMN thatexprseesa 
10x linker followed by BiolD2 with the following oligonucleotides. 
‘TLR9 C-BiolD FWD:CTGCCGGATCE 
cat 
Gt@acrecerace. 
‘The PCR products were separated on a 1% agarose gel and column purified 
(Qiagen). Purified PCR products were mixed and added to pBMN-LYT2 vector 
at was linearized with Stal (New England Biolabs) and subjected to a Gibson. 
action (New England Biolabs) following the manufacturer’ protocol 
Imaging MYD88-13X-BlofD2. MYDSS(L265P)-13X-BioID2 was retroviraly 
transduced into TMDS cells and then purified with at-LYT2 beads as described 
ove. Cells were frst cultured for 16h in 50M biotin. Nxt, cells were incubated 
ith ml" goat anti-human IgM Fab conjugated to Alexa Fluor 488 (Jackson 
Immunorescarch) for 90min t 37°C. During this incubation perid,clls were 
plated onto a 15 y-Side yell IbiTveat chamber slides Ibid for 30min and 
‘lowed to stick tothe slides, Celle were washed tice with PBS and then fixed with 
40% paraformaldehyde (Electron Microscope Sciences for 20min and then perme- 
silized with cd methanol for 10min at ~20°C. Faxed and permeabilized cll 
‘were blocked with Duclink blocking buffer (Sigma) for 30min at roam tempera- 
tue. Celle were then incubated with rabbit monoclonal antibody ant phospho- 
[KKay/3 (Ser176/180) (Cell Signaling Technology) dated 1200 in PBS with 196 
[BSA for 2h at room temperate, followed by two washes with BSA/PBS, Call 
were then incubated with anti-rabbit F(ab)2 conjugated to Alexa Foe 355 (Cell 
Sigoaing Technology) a 1,000, mouse ant-LAMPI conjugated to Alexa Fluor 
405 Santa Cruz Biotechnology) t 150 and streptavidin conjugated to Alesa Fluor 
647 (Biolegend) at 11 00 al dltedin BSA/PBS and allowed to incubate or Uhat 
oon temperature. Calls were then washed for 15min ns large volume of PHSBSA 
and mounted with Prolong Diamond mounting media (Invitzogen)- Images were 
‘equired on a Zeiss LSM 880 Confocal microscope. Images for display were pre- 
pared with NIE Image] /FIJP" and animations were prepared using the Imaris 3D 
rendering software (Biplane) The number of BiofD2 puncta and thee intensity 
were quantified from =-stack images (1m slices) using Blobinder 
In certain instances, TMD8, OCI-Ly10 and/or HBLI cells expressing 
-MYDS8(1265P)-13X-BlolD2 were also transduced with either contal shRNA 
(SC) or shRNAs targeting CD79A, TLRY or MYDS, as described above. After 
[puromycin (Invitrogen) selection, cells stained with streptavidin conjugated t0 
Alexa Fluor 385 (Thermo Fisher Scientific and anti-LYT2 conjugated to Alexa 
Fluor 647 (Biolegend). Cells were either subjected to FACS analysis as described 
above, or were imaged as described above. Biotin spots or blobs were counted 
using Blobfinder, as forthe PLA above. Likewise, these cel lines were ethe eft 
untreated, eated with aM ibrutinib or equivalent volumes of DMSO, and then 
stained and analysed in the same manner. 
‘Mas spectrometry and western blot analysis of BioID2 construct. TLR9-10X- 
[ioID2 pRMN-LYT2 and MYDS8-13K-BiolD2 pBMN-LYT? constructs were 
retrovrally transduced into DLBCL cell lines ax described above. Infected cell 
‘were enriched by postive selection with LYT2 magnetic beads Invitrogen). Cells 
‘were then grown in SILAC media, containing arginine and lysine labelled with 
ble isotopes o arginine and lysine, for 2 weeks before expansion to 100% 10" 
Cells In certain cases, cells were treated with either 10M ibrutinb or 200M 
AZD2014 for 4h, Then, 16h before sis, biotin (Sigma) was added toa final 
concentration of50)M to transduced cell Cells were then Iysed at 25 10” ells 
perm in RIPA butler modified for mas spectrometry analysis (1%NP-40, 0.5% 
‘deoxycholate, 50 mM Tes, pH 7.5, 150mM NaCl, mM Na;VOy, SmM NaF 
{mM AEBSF) for 10min. on ie Lysates were cleared by cenefugation at 1,000 
for 20min at 4°C. Pre-washed streptavidin agarose beads (35) were added to 


cach sample; samples were then rotated at 4°C for 2h then washed four times in 
1 RIPA butler, then solubilized with 4 LDS sample butfer (Invitrogen) with 
1% d-mercaptoethanol, and boiled for Smin. A fraction of lysates were alsa 
subjected to western blot analysis as described above. Western blots were probed 
‘with abit anti-CARDI1 and rabbit anti-MALT1 (Cell Signaling Technologies) 
and mouse ant-MYDs8 (Santa Cruz Biotechnology). 

For mas spectrometry analysis, proteins were separated by one-dimensional gel 
‘lecttophoress(4-12% NuPAGE Bs- Tris Gel) andthe entirelane of Coomassie 
blue-stained gel was cut into 20 slices. All slices were processed as described 
previously After tryptic digestion ofthe proteins the resulting peptides were 
"resuspended in sample loading butler (2% acetonitrile and 0.03% trifluoroacetic 
acid) and wee separated by an UliMate 3000 RSL.Cnano HPLCsystem (Thermo 
Fisher Scientific) coupled cline toa Exactive HF mass spectrometer (Thermo 
Fisher Scientific). First peptides were desalted on a reverse phase C18 precolumn 
(Dionex 5mm length, 31am inner diameter) for 3min. After Sain the precol- 
‘umn was svtched online tothe analytical column (30cm length, 75mm inner 
diameter) prepare in-house using RepeeSil-Pur C18 AQ 191mm reversed phase 
resin (Dr. Malsch GmbH). Buffer A consisted of 0.1% formic aid in H,0, and 
bulferB consisted of 80% acetonitrile and 0.1% formic acid in 4,0. The pep. 
tides eluted from bafler # (5-42% gradient) at a flow rate af 300! min * over 
‘Tema, The temperature ofthe precolunin and the analytical column was set to 
50°C during the chromatography. The mass spectrometer was operated in a Top 
data-dependent mode in which the 30 most intense precursors from survey MSI 
scans were selected with an isolation window of 1.6 Th for MS? fragmentation 
‘under a normalized collision energy of 28, Oly precursor ions witha charge tate 
between ?and 5 were selected. MSI scans were acquired with amass range fom 
350 10 1,800 m/z ata resolution of 60,000 at 200 m/z. MS2 scans were acquired 
‘vith starting massof 110 Th ata resolution of 15,000 at 200 mlz with maxizaam 
IT of 54ms. AGC targets for MSI and MS2 scans were eto 1x 10nd I» 105, 
respectively: Dynamic exclsion wis set 0 20s, 

‘Mas spectrometry data analysis. Mass spectrometry data analysis wss performed 
‘using the software MaxQuant (version 1.60.1) inked tothe UniProtKBISWss Prot 
yhuman database containing 155,990 protein entries and supplemented with 243 
frequently observed contaminants via the Andromeda search engine™. Precursor 
nd fragment ion mass tolerances were seta 6and20 pan fter iil calibra: 
tion, respectively Protein biotinylation, N-terminal acetylation and methionine 
‘oxidation wer allowed as variable modifications. Cysteine carbanidomethyation 
‘was defined asa fixed modification, Minimal peptide length war et to seven amin 
cds witha maximum of two missed cleavages. The alse discovery rate (FDR) was 
‘et to 1% on both the peptide and the protein level using a forward-and-reverse 
‘concatenated decoy database approach, For SILAC quantification, multiplicity 
‘wasset to two or three for double (Lys + 0/Arg+-0, Lys + 8/Arg +10) o triple 
(ys OV Arg +O,Lys-+ A/Arg-+-6, Lys + S/Acg+ 10) labeling respectively Atleast 
{wo rato counts were required for peptide quantification, The're- quantity’ option 
‘of MaxQuant was enabled. Data were filtered fo low confidence peptides 

“Nenogralt. ll mouse experiments were approved by the National Cancer Instte 
‘Animal Care and Use Committee (NCI-ACUC) and were performed in accord- 
ance with NCL-ACUC guidelines and under approved protocols. Approved pro- 
tocols allowed tumour growth below 20mm in any dimension no animals had 
‘tumours which exceeded these limits. Female NSG (non-obese diabetic (NOD)! 
severe combined immunodeficient (SCIDY 127g) mice were obtained from 
[NCI Fredric Biological Testing Branch and used for the xenograft experiments 
between and 8 weeks af ge. TMDS tumours wer established by subcutaneous 
{injection of 10 10" callin aL: Matrigel’ PBS suspension. Treatments were ini- 
tiated when tumour volume reached a mean of 20 an tbeutinib was prepared 
in PBS with 50% (vy) DMSO and administered intrapertonealy once per day 
(Gmghg day”), AZD2014 was prepared in deionized water with 1 (vl) Tween 
80 and administered per os once per day (15mg kg * day”). For ADZ2014 and 
‘rutin combination, drugs were given a the sme concentration and schedule 
as single agents. Tumour growth was monitored every other day by measuring 
{tumour size in two orthogonal dimensions and tumour volume was calculated 
bythe following equation tumour volume (length = with)2. Treatment ran- 
<domization and experimenter blinding were not possible. Sample size was et- 
‘mated based on preliminary experiments. Mice were censored if they ded during 
FEPEbiopsies. ll cases were needle aspirates, whole mph node biopsies or were 
‘obtained fom surgically removed specimens, Samples were fixed in 10% bul 
‘red formalin fr 18-24h and parallin embedded for long term storage. Samples 
‘were studied in accordance withthe ethics and principles ofthe Declaration of 
Helsinki and under Institutional Review Board approved protocols from the 
"National Cancer Institute National Institutes of Health Protocol Review Office 
{protocol numbers 10-C-0181, 10-CN-074 C, 00-C-133, 00-C-133) oF Jahns 
Hopkin School of Medicine (18800154052). Informed consent was obtained fom 
all patients or given an IRB-waiver as archived tissue submited for consultation 


to the Hematopathology Section. All samples were anonymized or de-identifid 
forsubsequent PLA analysis, 

shRNA and sgRNA sequences used in functional assays, shSCA (MSMOl exS) 
CICTCAACCCTTTAAATCTGA: shCD19 (3' UTR) GATTCACACCTGACT 
CTGAAA;shCD79A (3! UTR) GGGGCTTCCTTAGTCATATTC: shTLRO #1 
(09133) GAGCTAAACCTGAGCTACAAG;sHTLRO #2 (3 UTR) GCACGGTGCC 
ACCTCCACACT; shMYDS& #1 (3! UTR) GTACCAGTATTTATACCTCTA; 
‘ShMYD8® #2 (ex3) GGCATATGCCTGAGCGTTTC; shBCLIO #2 (3 UTR) 
(CTGACATTGTCTCCTATATA; shCARDII (3' UTR) GGGGTGTGTACCA 
GGCTATGA; sgTLR9 #8 GACCAGGCTCCCGAAGGAAG, sgMYDS¥ #10 
CCGGCAACTGGAGACACAAG; sgUNC93B1L #873 TOTTGCCATACT 
‘TCACCTOG: sg NPY3 £9 TCAGCACGTGGTTGGCGCAG. 

Reporting summary, Further information on experimental design is available in 
the Nature Reseach Reporting unsmary linked to this paper. 

Code availability. ll computer code i avalable at hep lymphochipaih.gow! 
localiCRISPR. 

Data availability, The gene expresion datasets generated fr these analyses are 
included in the Supplementary Information, ot have been deposited in Gene 
Expression Omnibus (GEO) under accession numbers GSE99276. Primary 
“sequencing data and copy number analysis DLBCL cates wil he made avallable 
through the NIB dbGAP system (accession numbers phs001 444, phsOO1184 and 
hsO00178, itps://wwwwncbinlm nih gow/pojects/gapegl-bin/studygistudy_ 
‘d=phs001444vL pI and the NCI Genomic Data Commons All CRISPR screen 
data, SILAC mass spectrometry, and genomic data used in the manuscript are 
incladed in the Supplementary Tables, 
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Extended Data Fig.1 | CRISPR screen controls. a, Schematic of CRISPR-_plots display mean and interquartile data outliers represent 10% of the 
ass screens in lymphoma cell lines. b, 991 negative control non-targeting total dataset, Cumulative CRISPR screen scores for indicate genes are 
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the -score ofthe average log; fold change ofall sgNASs targeting a given 
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wwithan shRNA targeting CD19. Shown is the lg ratio of the percentage independent biological replicates. See statistics and reproducibility 
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10-12 days) versus the intial time point (tia, day 0). ABC lines are expressing cells with knockdown of CD19 or negative control genes. 


V8 Springer Nature Lire Alsights reserve 


LETTER 


» 


tt erent oF 


1 cre. currsanpisen 


a 


ome a es nL 
ee 


re unease amplicon 


: 
Extended Data Fig. | See nest page for caption. 
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Extended Data Fig. 4 | TLR9 overexpression and association with the 
[BCR are features of ABC DLBCL. a, Gene expression values log: FPKM) 
values of TLR9 associated genes are shown by DLBCL subtype, ABC in 
blue (n =294), GCB in orange (n= 168) and unclassified (Unc) in grey 
(v= 115), Gene expression data were correlated with DNA copy number 
and linear regression calculated for ABC samples, *P-< 0.05, ***P < 0.001, 
linear regression (let) "P< 0.05, ©**P-<0.0001, one-way ANOVA and, 
Tukey's pos test (right), Amplification of the UNC93B1 and CNPYS 
loct (black lines, below chromosome ideogram). Minimal shared amplified 
regions in ABC DLICL biopsies are bracketed and genes displayed below, 
«The essential TLR9 interactome in TMDS, TLR9-BiolD2 interactome 


determined by SILAC-based mass spectrometry () axis) ploted by 
the CRISPR screen score (CSS, x axis). Bait (TLRS) is labelled in blue 
Essential interactrs are labelled in red, those shared with HBLI (Fig. 2c) 
are labelled in dark red. d, Venn diagram of the overlap of TLR9-BiolD2 
interactors identified by SILAC-based mass spectrometry in experiments 
performed in TMDS and HIBLI ABC lines. The 47 overlapping proteins 
relisted, e, The enrichment of 7 overlapping TLR9-BiolD2 proximal 
proteins is shown (top) relative to their CSS (bottom). Gene names labelled 
in red are enriched and toxic to both HL and TMDS, See statistics and 
reproducibility’ section. 
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Extended Data Fig, 5 | See next page for caption, 
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Extended Data Fig. 5| IgM interacts with intracellular TLR9 in 
ABC DLBCL lines. a, Whole-cell lystes of indicated DLUCL cell lines 
‘were immunoprecipitated with anti-lgM or isotype control antibodies 
before being immunoblotted with IgM or indicated TLR antibodies, 
representative blots; =3. , ABC DLBCL cell ines HBL1 and TMDS 
‘were incubated on ie with IgM or isotype control antibodies and lysed 
Lysates were immunoprecipitated (plasma membrane) with IgM or isotype 
control. Unbound lysates (cytosolic) were then immunoprecipitated 

‘with IgM or isotype control antibodies before al immunoprecipitated 
Iysates were immunoblotted with the indicated antibodies: representative 
bots, n= 2. Left, confocal images af PLA reaction between IgM and 
TLR9 in HBL1 and TMDS cells transduced with control SCA, CD79A ot 
TLR9shRNAs, Cells were puromycin selected and shRNAs induced with 
dox fortwo days before being fixed and imaged. Right, quantification 

‘data from three separate experiments. Data are pooled biologically 
independent experiments of n> 100 cells scared per condition. Box plots 
represents median and 25-75% of data, whiskers display 10-90 percentile. 
5*P 0,01, **P-< 0.001, one-way ANOVA with Dunnett post text. 

4. An JgM-TLR9 PLA (red) was performed in a panel of ABC and GB 
DLUCL cell lines and the presence of chronic active BCR signaling 
(denotes present), MYDS8 mutational status and IgH isotype ( 

“= IgG) are displayed. Nuclei were stained with DAPI (blue) and 
membranes were visualized by WGA (green) e, The number of puncta per 
cell oF gM-TLR9 PLA is quantitated, Hox and whisker plots display mean 
tnd interquartile data, whiskers display 10-90 percentile. Data are fom, 
three fields of cells quantified from one representative experiment of three 
biologically independent replicates. f, The data from Extended Data Fig. 5e 
segregated by ABC (blue, n=9) and GCB (orange, n=8) lines. Box plots 


IeM, 


represent median and 25-75% of data, whiskers display ange. **P-< 0.01, 
‘Mann-Whitney unpaired one-tailed t-test. g,1gG-TLRO PLA (red) was 
performed in indicated DLBCL cel lines co-stained with DAPI (blue) 
And IgG-AlexaFluor488 (green). MYD88 mutational status, IgH isotype 
and presence of chronic active BCR signalling ('+" denotes present 
denotes absent) are displayed, Representative data from two independent 
biological replicates. h, To define the cytoplasmic location of the BCR 
“TLRS interaction, we counterstained ABC cells for LAMPL, a marker of 
late endolysosomes, in which TLR resides, and performed PLA between 
JgM-TLRS, IgM-LAMP! and IgM-SYK. The PLA signal sin red, LAMPL 
is counterstained in blue, Representative data from three independent 
biological experiments i, To quantify the association between PLA signals 
and LAMPI staining we calculated the Pearsons correlation cnefficients 
acrossall pixels in each imaged cel (n—25 cells per PLA pai). The hi 
Correlation was between an igM-LAMP1 PLA and LAMPI staining 
(#=0.471), whereas the correlation between an IgM-SYK PLA signal 
and LAMPI was much lower (R=0.153) The correlation between the 
JgM—TLR9 PLA signal and LAMPI staining was intermediate (R= 0.310), 
indicating that significant component of the IgM-TLRS interaction is 

in LAMPI” vesicles. Quantified data are from ane of three independent 
biological experiments. j, Quantification of the IgM-TLR9 PLA signal 
after ectopic expression of ether empty vector, TLR9, wild-type MYD88 
‘or MYDS&(L265P). Data pooled from 3 (HBL) or 2 (TMD) biologically 
independent replicates of > 100 cells scored per condition, tox plots 
‘represents median and 25-75% of data, whiskers display 10-90 percentile 
*P-<005, **P-< 001, **P < 0,001; one-way ANOVA with Dunnett's post 
test. See ‘statistics and reproducibility! 


S-TLAG Myo9e w2 


ee 
H 
jes 
3 oas. 
‘ 
omer 
a 
as 
rs 
ae 
wel 
IRFAS. aHELY 
dose tinte 
Seating 


Extended Data Fig. 6 | TLR9 knockdown phenocopies MYDSS 
knockdown, a, TLR9 shRNA is rescued by overexpression of TLR9, 
HBLI cells were transduced with empty vector (EV) or wild-ype TLR9 
expressing dsRedE-xpress2 vectors and then with shRNA vectors marked 
by GEP targeting a contrl (SC), MYDSS or TLRS. The percentage of| 
double-positive cells was monitored by FACS and normalized to day 

0. One of three representative biologically independent experiments 

is shown. b, Heat map of gene expression values showing the global 
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phenocopy of MYD88-dependent genes afer shRNA-medisted 
knockdown of TLR9 or MYDSS in HISLL at indicated time points. c, Gene 
signatures enriched in dowenregulated genes from HBL1 or IMDS after 
shRNA- mediated knockdown of TLRS. d, Normalized IxBo luciferase 
reporter level at indicated time points after knockdown of TLRO with 
indicated shRNAs. Data are mean and sem. of nine technical replicates 
from n=3 independent biological experiments, *P <0.05, ***P-< 0.001 
‘one-way ANOVA with Sidak’s multiple comparison test 
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Extended Data Fig 7 | See next page for caption. 


Extended Data Fig.7 | The MYD88(L265P) interactome in ABC 
DLBCL. a, The essential MYD88(L265P) interactome in HBLL 
MYDas(L265P) -BiolD? interactome from SILAC-based mass 
spectrometry (y axis) plotted by the CSS (x axis). Bait (MYD88(L265P)) 
Is labelled in blue. Essential interactors are red, with those shared with 
sither TMD$ or OCE-Ly10 labelled in dark red. b, Venn diagram ofthe 
overlap of MYD88(.265P)-IiolD2 interactors in TMD8, OCL-Ly10 and 
HBLL ABC lines. Proteins found in two or more experiments are listed, 
«Lysates of IMDS, HBLI and U2952 cells transduced with empty vector 
fr MYD88(1265P)-BiolD2, selected and treated with 50M biotin for 
24h. Lysates were prepared and immanoprecipitated with streptavidin 
ipefore being immunoblotted with CARDI and MYDS antibodies. One 
representative blot is shown foreach cell lin from n—3 biologically 
independent experiments (HBL1, TMD8) and = 1 (2932) d, Lysates of 
"TMDS cells transduced with empty vector, MYD88(L265P) or wild-type 
BiolD2-fasion proteins, selected and treated with 50 iM biotin for 24, 
Lysates were prepared and immunopreciptated with streptavidin before 
being immunoblotted with CARD11, MALTI or MYD88 antibodies 
representative blot; 3. €, Confocal image af a PLA of BCLIO with 
MYDAK. Data pooled from 6 biologically independent replicates of > 200 
cells scored per condition, Box plots represent median and 25-75% of data, 
Whiskers display 10-90 percentile; one-way ANOVA with Dunnett’ post 
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test f, BCLIO-MYD88 and MALTI-MYDS8 PLA in ABC (blue, 19) 
and GCB (orange, =) lines. Box plots represent median and 25-75% of 
data, whiskers display range; Mann-Whitney unpaired, one-tailed ¢-tst 
 BCLIO-CARDIIPLA after shRNA knockdown of indicated genes in 
‘ABC (blue) and GCB (orange) lines, CD79B and MYDS mutation status 
is displayed below each cel line. Date are from 3 fields of ell quantitated 
from | representative experiment of 3 (HBL1),2 (BJAB, DOHH2) or 1 
(OYB, RIVA) biologically independent replicates of > 90 cells scored 

per condition. Box plots represent median and 25-75% of data, whiskers 
splay 10-80 percentile; one-way ANOVA with DunnettS posttest. 

bh, ABC lines expressing MYD88(1265P)-BiolD2 were treated with 
DMSO or 101M ihrutinib for 24, and the numbers of biotin puncta 

‘were quantified from confocal images (epresentative experiment, 

n=3), Fishers exact test, two-sided. i, SILAC-based mass spectrometry 
‘comparison of MYD$8(.265P) -BiolD2 interactome in TMDS cells treated 
‘with DMSO (x axis) versus 10M ibrutinib (yaxis). Proteins reduced 
‘upon ibrutini treatment ace shown in red, those similarly decreased in 
to separate cell lines (Fig. 4a) ae labelled in dark red. Bait (MYD88) 

is labeled in blue. Venn diagram shoving overlap of proteins decreased 

bby more than 30% in OCI-Ly10 cells (Fig, 4a) is shown as an inset. 
*P<005,**P.< 001, ***P-< 0.001; seestatstcs and reproducibility 
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'oM-TLAS PLA puncte 
{| IgM-TLR9 PLA identifies ABC samples with 
chronic active BCR signalling in tissue microarrays a, [gM-TLR9 PLA 
as performed on a formalin-fixed, parafin-embedded (FFPE) tissue 
ticroarray of lymphoma cll lines. PLA puncta were quantified and 
plotted asthe absolute numberof spots per cell rom a least 2 images 
of L representative experiment from 3 independent tissue microarray 
replicate. Box plots represent median and 25-75% of data, whiskers 
display range. Cell lines are divided by putative lymphoma subtype or 
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00 


presentation. BL, Burkitt lymphoma; BPDC, blastic plasmacytoid dendritic 
‘ell neoplasm; HL, Hodgkin lymphoma; MZL, marginal zone lymphoma 
PMBL, primary mediastinal B cell lymphoma; WM, Waldenstrom's 
rmacroglobulinemsia. b, Representative confocal fluorescent image from 
three independent biological samples ofa germinal centre from a reactive 
Iymph node. IgM-TLR9 PLA is shown in red; CD20 sin green; CD138 is 
in white; and DAPLis in blue 
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Extended Data Fig.9 | Waldenstrim's macroglobulinaemia can utilize 
the My-T-BCR supercomplex. a shNA-medited toxicity of indicated 
jgenes in two Waldenstrdms macroglobulingemia cell lines (RPCI-WM1 
and MWCL-1). Control (SC), CD79, TLR9 or MYDAS shRNAs were 
expressed in tandem with GEP and the relative eve of GEP was followed 
ver time by FACS. Data are mean and sm. of independent biological 
experiments; see statistics and eproducibility’ section. b, Confocal images 
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from one of two representative biologically independent experiments 
ofthe PLA reaction between IgM and TLR® (red puncta). Cells were 
counterstained with DAPI (blue) and WGA (green). Scale bar, 10pm. 
«Normalized quantification (PLA scare) oflgM-TLRS. Data were 
‘guantified from a least 28 cells per condition. Box plots represent median 
and 25-75% of data, whiskers display range 
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Karyotype engineering by chromosome fusion leads 
to reproductive isolation in yeast 


Jingchuan Luo", Xiaoji Sun', Brendan P. Cormack? & Jef D. Boeke'* 


Extant species have wildly different numbers of chromosomes, 
even among taxa with relatively similar genome sizes (for example, 
insects)!*. This is likely to reflect accidents of genome history, such 
as telomere-telomere fusions and genome duplication events” 
Humans have 23 pairs of chromosomes, whereas other apes have 24, 
Onc human chromosome isa fusion product of the ancestral state, 
‘This raises the question: how well can species tolerate a change 
in chromosome numbers without substantial changes to genome 
content? Many tools are used in chromosome engineering in 
Saccharomyces cerevisiae”"*, but CRISPR-Cas9-mediated genome 
editing facilitates the most aggressive engineering strategies. Here 
wwe successfully fused yeast chromosomes using CRISPR-Cas9, 
generating a near-isogenic series of strains with progressively 
fewer chromosomes ranging from sixteen to two. A strain carrying 
only two chromosomes of about six megabases each exhibited 
‘modest transcriptomic changes and grew without major defects. 
‘When we crossed a sixteen-chromosome strain with strains with 
fewer chromosomes, we noted two trends. As the number of 
chromosomes dropped below sixteen, spore viability decreased 
‘markedly, reaching less than 10% for twelve chromosomes. As the 
number of chromosomes decreased further, yeast sporulation was 
arrested: cross between a sixteen-chromosome strain and an eight- 
chromosome strain showed greatly reduced full tetrad formation 
and less than 1% sporulation, from which no viable spores could 
be recovered. However, homotypic crosses between pairs of strains 
with eight, four o two chromosomes produced excellent sporulation 
and spore viability. These results indicate that eight chromosome- 
chromosome fusion events suffice to isolate strains reproductively. 
Overall, budding yeast tolerates a reduction in chromosome number 
‘unexpectedly well, providing a striking example of the robustness 
of genomes to change. 

Chromosome engineering in . cerevisiae is driven by technological 
advances”. A haploid yeast strain with 33 chromosomes has been 
generated by splitting natural chromosomes into smaller chromo 
somes’. On the other hand, the two largest S, cerevisiae chromosomes, 
IV and XII, were fused by homologous recombination’, producing 
yeast with a 3.2-Mb compound chromosome that grew well. Further 
fusions (chromosomes VII-V-XV-IV) generated a4.3-Mb compound 
chromosome with the longest yeast chromosome arm engineered pre: 
viously (3.7 Mb), and a haploid chromosome number of n= 12, with 
no observed effect on fitness”. CRISPR-Cas® expression in S. cerevi- 
‘siae" permits efficient engineering without selection. We used this to 
push the lower limit of chromosome numbers in S. cerevisiae by fusing 
chromosomes, producing a series of strains with progressively fewer 
chromosomes without affecting gene content. 

{Atleast tree potential biological obstacles might hinder engineering 
of karyotype. First, studies of the field bean Vicia faba have suggested 
thatthe length limit to chromosome arms is half the average spindle 
axisat telophase for normal development! Longer arms might yield 
incomplete sister chromatid separation, lagging chromosomes and 
‘micronucleus formation, impairing fertility and development. Second, 


as centromeres are deleted, excess kinetachore proteins may cause 
problems. Finally, mitotic mechanics may be affected—centromeric 
Torce may not suffice to pull very long chromosomes poleward. The lat- 
ter isa particular concern for. cerevisiae, in which point centromeres 
are bound by single microtubules; most organisms with larger chromo. 
somes have regional centromeres bound by multiple microtubules" 
Karyotype engineering in S. cerevisiae, therefore, investigates whether 
point centromeres can segregate large chromosomes. 

We devised specific paths to evaluate minimization of the number of 
chromosomes (n). We frst fused all small chromosomes to maximize the 
‘number of chromosomes fused before hitting a potential chromosome 
arm length limit. Specifically, we fused chromosomes IX, Il and I, then 
\V and VIII, and finally 1 and VI to generate an n= 12 strain (Fig. 1). 
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Fig. 1 | Fusion chromosome paths and strategy: This diagram shows 
hhow we fused chromosomes together from the wildtype ("= 16) 0 

n= 12, n=8, n—4 and finally n—2. The 16 chromosomes are coloured 
‘uniquely and arranged by number. A rule indicates the distance from, 
the centromere. The centromeres of n —4 and ni ~2 are aligned to the 
(position, Please note that the length of the rDNA array, whase position 
isindicated with an asterisk, i omitted. Red lettering indicates the 
chromiovome that hasan active centromere in the compound chromosome, 
All the chromosomes are oriented from left to right (bottom to top in the 
figure) in the fusion chromosomes. For more details see Extended Data 
Fig 1, 
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Fig.2 | Construction and characterization of fusion chromosomes. 
8, A schematic showing a CRISPR-Casd based method to fuse any to 

chromosomes together. CEN3 and CENI, centromeres of chromosomes 
Land | respectively b, Pulsed-field gel electrophoresis with a standard 
protocol for Saccharomyces cerevisiae c, Pulsed-field gel electrophoresis 


We developed a CRISPR-Cas9-based strategy to fuse any two 
chromosomes while largely preserving isogenicity (Fig. 2a, Extended 
Data Fig. 1a and Extended Data Table I; see Methods). Pulsed field 
gel electrophoresis, used for karyotyping S. cerevisiae, confirmed the 
presence of progressively fewer and larger chromosomes (Fig 2b) with 
all chromosomes shorter than 600 kh merged in the n= 12 strain, 

Given that 3.2-Mb and 4.3-Mb compound acrocentric chromosomes 
havebeen engineered previously", we next produced a strain with four 
long chromosomes, each about 3 Mb long (for simplicity the length ofthe 
DNA array is omitted from lengths reported herein; Fig. 1). The serial 
disappearance of chromosomes confirmed that sequential fusion events 
occurred as planned (Fig, 2c). The n =4 strain had four chromosomes, 
each around 3 Mb (Fig. 2d). At n=4, the total chromosome number 
drops substantially below the smallest number previously achieved in 
engineered S. cerevisiae trains. Moreover, n=-4isalso lower than is seen 
inany extant Saccharomycetaceae species with point centromeres. 

Finally, to generate a strain with only two metacentric chromosomes, 
each about 6 Mb long, we planned to fuse the final four chromosomes 
inany viable order or orientation (Extended Data Fig, 1c), generating 
two versions of n=3 strains by fusing different chromosomes in the 
n=4 strain (Fig. 2d) To distinguish between these, we refer to them as 
n=3 (yJL381) and n=3' (yJL410) (Extended Data Table 2). further 
fusion cycle in n=3 generated an n=2 strain, with two metacentric 
chromosomes each about 6 Mb long (Fig, 2d). Each chromosome in 
the n=2:strain now carries about half ofthe genomic content, which 
is unchanged compared to the n= 16 strain, except forthe deletion of| 
14 centromeres and 28 telomeres. 

‘We tried to generate an n=1 strain by multiple strategies, for exam- 
ple, changing arm length (balanced versus dissimilar), position of 
centromere, and remaining telomeres (Extended Data Fig, 1d), but 
ere unable to produce one. Has the limit of chromosome arm length 
been reached? Given that anaphase spindle axis length could extend 
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‘with Hansenula wing! (also known as Wickerkamomyces canadensis) 
chromosomal DNA as a marker. d, Pulsed-field gel electrophoresis with 
S.pombe (S.p.) chromosomal DNA as a marker S.c, S. cerevisiae. n=3 
strain is yJL381;n—3 stra is ILA, 


Lup to 10}2m”, along with extrapolation from previous measurements 
(0.85yum per 1 Mb)*, we estimate maximum arm length at about 5:9 Mb 
(see Methods) or slightly less than half ofthe full genome length if 
DNA is included. This may make it more difficult to produce an 
‘n= 1 strain compared to other fusion steps, because the arm length 
of a metacentric single chromosome may approach the maximum. 
However, as we sampled only afew chromosome fusion paths (of 10° 
possible), other paths might lead to an n= 1 strain, An accompanying 
study succeeded in producing an n= 1 strain using related methods, 
in combination with deleting repetitive regions! 

‘We analysed growth and resistance to various stresses for different 
n strains. Unexpectedly, strains with between four and sixteen chro 
‘mosomes grew wel in different media and stress conditions (Extended 
Data Fig. 2). Like n=4, the n=3 and n=2 strains lacked obvious 
fitness defects (Fig. 3a). These strains are healthy, indicating that 
S. cerevisiae can handle large chromosomes and therefore, a regional 
centromere is not required to segregate S.pombe-sized chromosomes. 
‘To quantify small differences in fitness, we performed quantita: 
tive competitive growth assays by co-culturing differentially tagged 
BY4741 cells and n=4 or n= 2 cells, and measuring changes in flu 
orescence ratios with time. The n=4 strain grew similarly tothe wild 
type (98.7 + 0.3%), whereas the n=2 strain grew slightly more slowly 
(0134049; Fig. 3b) 

Fusion of chromosomes might trigger secondary genome or trans- 
criptome changes. Sequencing of genomes from the n =2, 4, 8 and 12 
strains revealed no evidence of aneuploidy or regional copy number 
differences, and few new single nucleotide polymorphisms (SNPs) or 
small insertons/deletions (indels) (Extended Data Fig. 3 and Extended 
ata Table 3). The numbers of SNPs detected were within threefold 
of expected numbers based on reported error rates” (Extended Data 
Fig, 3c). Two out of eleven SNPs (AMN1 L317F and QDR3 P5865) 
were predicted to be deleterious by the Sorting Intolerant From 
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Fig. 3 | Characterization of fusion chromosome strains. a, 
dilution assays under seven different conditions with n= 16, 
ind n =2 strains, HU, hydroxyurea: MMS, methyl methanesulfona 
SC, synthetic complete medi; YPG, yeast extract peptone with 3% 
alycerol.b, Competitive growth asays. Each experimental group was, 
tested in biological triplicate. Group 2 (n =16 vs n= 4), P=0,0048; 
group 3 (n= 16 ys n—2), P—0.00013 (one-sided t-test). *0.01 < P< 0.05, 
#9001 <P-<0.01, *#*P<0.001. ¢, A volcano plot showing RNA-s¢q 
data by comparing the transcriptomes ofthe n= 2and n-— 16 strains. Red 
dots indicate genes whose expression was significantly different in the 
‘n= 2strain compared tothe n= 16 strain (P< 10 fold change| 

Some DNA replication stress response genes, including RNK3 and HU 


‘Tolerant (SIFT) online tool (http://siftjevi.org/), which assumes that 
important positions in coding sequences of proteins are conserved 
during evolution. AMN1 L317F is a potential suppressor of fusion 
chromosome state, because AMNI controls mitotic exit. Fusion 
efficiency did not drop markedly as n was reduced (Extended Data 
‘Table 1). RNA sequencing (RNA-seq) analysis revealed only minimal 
perturbations in the n=8 and n=4 strains (Extended Data Table 4). 
Because n=2does havea growth defect, we were unsurprised to find 
a few substantial transcriptome changes, including six downregulated 
genes (Fig. 3c and Extended Data Table 4). Strikingly, all these genes 
are positioned near the four remaining telomeres (Fig. 3d). Asexpres- 
sion ofthe silencing genes SIR2, SIR3 and SIR4 was similar between 
n=2 and wild-type strains, we hypothesize that downregulation of 
these six genes was due to enhanced telomere position effect (TPE). 
Consistent with this, further analysis of gene expression near the 
remaining telomeres showed reduced expression for genes within 
20 kb of the telomeric repeats (Fig. 3e). The n=2 strain also showed 
some upregulated genes lying close to newly fused telomeres; after 
fusion, when these genes are no longer subject to TPE, they are likely 
to be transcriptionally de-repressed (for example, genes in chromo- 
somes IXR, VIL, VIIL and VIIIR; Extended Data Fig. 4. For the n=4 
and n=8 strains, the few genes with substantially altered expression 
were also telomere-proximal (Extended Data Table 4). The signifi 

cantly expressed subtelomeric genes overlapped extensively with 
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ae also upregulated inthe n =2 strain, P values derived from two-sided 
t-test. YKLIS7C also known as FATS; YLR2SIC also knovn as RSOSS, 

4, Comparison of transcriptomes of n=2and n —16 strains. Genes 
located within 20 kb of remaining telomeres are shovsn in blue those 
‘within 20 kb of fused telomeres are shown in ted; others ate shown in 
{rey €,'Metachromosome of remaining telomere plots, Top, schematic 
<lagram showing inthe metachromosome view all telomeres are aligned 
‘nthe left; we show expression changes of genes within 100 kb ofthe 
nearest telomere. J-axis, log fold change in gene expression comparing 
transcriptome of 2 to — 16); x-axis, distance of genes from the closest 
telomere. 


subtelomeric genes upregulated in sir deletion strains (Extended Data 
‘Table 4). Finally, subtelomeric genes were highly enriched in genes 
‘whose expression was significantly affected in the n= 2, n=4 and 
strains (Extended Data Table 4). 

Karyotype engineering to reduce chromosome number without 
affecting gene content revealed only mild changes in gross phenotypic 
and global gene expression. Its well known that karyotypic changes, 
including chromosomal rearrangement, polyploidy and hybridization, 
contribute to post-zygotic isolation, especially in plants". About 100 
million years ago, a whole-genome duplication (WGD) event resulted 
{na modal yeast chromosome number shift from n=8 to n= 16. The 
‘major mechanism of chromosome number reduction in post-WGD 
species with n < 16 and non-WGD species with n <8 was inferred to 
be via telomere fusion and loss of one centromere’, resembling the 
process intentionally engineered here. 

“Translocation events have been associated with reproductive isolation 
in Saccharomyces species, A single naturally occurring reciprocal 
translocation was estimated to reduce spore viability to 50% when back- 
crossed to the wildtype, owing to unbalanced segregation of essential 
genes on those translocation regions, while non-reciprocaltranslo- 
Cation was 759%", For chromosome-chromosome fusion, it is more 
difficult to model the effects on spore viability. The question remains, 
hhow much variation in chromosome number results in reproductive 
isolation (operationally defined here as <1% viability). Here, we are 
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Fig. 4| Karyotype engineering leads to reproductive isolation. 
a Schematic f the experimental approach, The simiber inside the yeast 
isthe numher of chromosomes b, Sporulation efficiency of hybrid 

struins. Y-axis, percentage of asc with 0-4 spores; x-axis, diploid strains 
“The number of aci connted ie shown in parentheses. c, Spore clone 
germination rates for hybrid strain. Representative images are shown. The 
fhumber af spores counted is shown in parentheses d, Spore viability for 
wild-type diploid BY4743 2n =32, isogenic diploids: 2n =16 strain, 20 =8 
sruin and 2n —4 strain. Representative images are shown, The number of 
spores counted is shown in parentheses. 


able to empirically assess the impact of chromosome number in iso- 
lation, because the strains represent an isogenic series (see Methods 
for definition of isogenic as used here) with the parent strain, BY4741, 
To explore reproductive isolation in this context, we mated each 
BY4741-derived strain from the fusion chromosome series (n= 16 
to n=2) to SKI (n= 16), an efficient sporulating strain, and spor 
lated the diploids (Fig da) All crosses readily produced viable zygotes, 
showing no barrier to zygote formation, We confirmed that nuclear 
fusion had occurred (Extended Data Fig. 5a) ruling out block in kar- 
yogamy. We examined both success rate for meiosis, that is, sporula- 
tion efficiency, and the successful segregation of one genome’s worth of| 
information into each spore, monitored by germination efficiency. For 
n=16 x n= 16 crosses, more than 80% of asci developed 3-4 spores, 
and overall spore efficiency (per cent of asi with atleast one spore) was 
97.2%. For the ‘asymmetric’ (n-< 16 x n= 16) crosses, as n decreased, 
sporulation efficiency dropped markedly: For n=8% 
fewer than 5% of asi developed 3 or 4 spores. For 
matings, ewer than 1% of asci produced 3 or 4 spores (Fig. 4b). 
‘Totest germination of offspring spore clones, we used a fully isogenic 
configuration, because spore clones that arise from n= 16 x n= 16 
BY x SKI hybrid diploids showed differences in spore clone colony 
size, reflecting differences between the BY and SK1 backgrounds. We 
crossed BY4741 strains (n= 16-8) to BY4742 (the isogenic n= 16 
“MATa partner of BY4741), and performed dissections of four-spored 
tetrads, Whereas n= 16 x n= 6 gave rise to more than 90% viable 
spore clones, = 16 x "= 14 produced only 33.9% viable spores. In 
n=16 x n= 12, survival dropped markedly to below 10% (Fig. 4c, 
Extended Data Fig 5c) Inn= 16x n=8, only very few tetrads formed 
ater 15 days sporulation, We dissected 16 such tetrads and no spores 
(of 64) were viable. This did not reflect defects in chromosome fusion 
strains perse,asisogenic n=8 x n=8 diploids traverse meiosis as well 
as n= 16 diploids (Extended Data Fig, 5b), giving rise to 98.4% via- 
ble spores (Fig. 44). Indeed, isogenic n=4 x n=4 and n=2 x 
diploids also sporulated well and generated viable spores (Fig. 4d, 
Extended Data Fig. 5b). We observed 5.68% small colonies in spores 
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from the n=2 x n=2 cross only. Thus, the =8, 4 and 2 strains are 
fully capable of performing meiosis and generating viable progeny, but 
are incapable of generating viable progeny withthe n= 16 strain, These 
results indicate that eight chromosome-chromosome fusion events suf 
fice to result in virtually complete reproductive isolation. Reproductive 
isolation comes from atleast three sources. 1) Owing to problems with 
‘nondisjunction in meiosis I, there isa steady rise in probability that a 
spore will inherit genome missing a least one chromosome’s worth 
of genetic information as drops. 2) Even if full complement of genes 
is inherited, the probability of a lethal dosage imbalance is elevated in 
asymmetric crosses, 3) Recombination between the concatenated and 
native chromosomes is predicted to lead tothe formation of deleterious 
and genetically unstable dicentric (or multicentric) chromosomes" 

‘We have shown that S. cerevisiae can survive with two chromosomes, 
and grow relatively well with four chromosomes, atleast under labora- 
tory conditions. Why does it have sixteen small chromosomes whereas 
S. pombe can get by with only three larger ones? 

‘We consider three possibilities for phylogenetic distribution of n 
First, the extant pattern of values of n in yeast could be a histor 
cal product of a WGD event(s) and/or spontaneous chromosome 
fusions and breaks" Saccharomyces (n = 16) and related genera arose 
asa WGD event relative to a large cluster of ‘preduplication’yeasts* 
(mostly n=8), Second, a genome with 16 small chromosomes readily 
‘becomes aneuploid, allowing rapid and reversible adaptation to severe 
environmental changes”. However, aneuploidy can also result in 
deleterious imbalances*™*. With few large chromosomes, aneuploidy 
is more likely to be selected against, limiting potential for adaptation 
to environmental changes. Finally, sub-telomeres are enriched in 
‘contingency’ genes that encode functions related to the cell wall and 
‘metabolism of different nutrients, These genes are transcriptionally 
repressed under normal conditions but can be specifically expressed 
in appropriate environments or conditions”. A higher number of 
telomeres may allow more elaborate fine-tuning ofthese properties, 
improving the ability ofa generalist species to adapt more rapidly to 
diverse environments and stresses. 

‘We have efficiently generated a series of chromosome fusions, build 
ing a collection of strains with progressively fewer chromosomes, 
including n=2. Unexpectedly, yeast growth was robust, and mostly 
indistinguishable from normal karyotype strains, even with chro 
:mosomes up fo four times the maximum size (excluding rDNA) of 
wild-type strains, and with greatly reduced numbers of telomeres and 
centromeres. The n= 2 strain grew slightly more slowly than the wild 
type, perhaps owing tits large chromosomes or altered TPE. Meiosis 
was strongly affected in matings between strains with diferent numbers 
of chromosomes, suggesting that chromosome fusion and concomitant 
reduction in chromosome number suffices as a reproductive barter. 
‘The strains described here may be used to probe aspects of meiotic 
recombination, replication origin timing, the role of 3D nuclear struc: 
ture in transcriptional regulation in yeast, or recombination donor 
preference, to name a few. Reproductive isolation may also prove to be 
‘useful for future studies involving field release of engineered yeast or 
other cases in which genetic isolation is desirable. 

‘The n=2 strain is reproductively isolated from the n= 16 strain, 
but isita new species? By classical biological species definitions (post- 
zygotic reproductive isolation)**, it could qualify. However, by 
phylogenetic species definitions (a group of organisms sharing an 
ancestor), n=2 is not anew species because it has near-zero sequence 
divergence from n= 16, and its morphology is similar®. Eight fusions 
suffice to cause practically complete reproductive isolation, which, with 
further neutral evolution, would allow such pais of strains to accumu: 
late diverse mutations overtime, leading to speciation by any definition. 
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METHODS 
‘Strains. ll strains that we constructed i this study are listed in Extended Data 
“Tale2 They aeall desived from BYA7Al (MATa his3A0leu2A0 met 5A0 ur 3A0) 
by transformation events. For sporulation efliciency measurement experiments 
the SKI strain (MATa. hos ¥S2 2, a3, leu2:hisG,hsBshisG, rplshisG) was 
used to mate with strains of diffrent». Forth tetrad dissection experiments, 
BY742 (MAT hist AL lu2A0ly22AO una3A0) was sed to mate to strains with 
diferent values of 17 loxPaym ites were added atthe sts of centromere dele 
tion and telomere fusion anticipating fare genome rearrangement experiments 
Calculation for potential paths to fuse 16 chromosomes into one chromosome. 
“The equation for this calculation ea follows: (32% 30 x 28... 2% 16): 
1.097 10" fwe take into consideration final centromere choice (16 conditions) 
Tm addition, we count the same configuration twice by arbitrary defining left ar 
and right arm. To give simple example to explain this pont, suppose there are 
nly two chromosomes: chr and ch I, the fina fusion chromosome chr IL-chr 
chr Il-chr HR isthe same as ch HR-chrHl-chr IR-chr IL, thus wedivide by 
2 to account fr tis 

‘CRISPR-Cas9 method. The CRISPR-Cas9 method for genome editing in S.cer 
_visae dramatically stimulates homologous recombination (HR) by co-expressing 
Cas and site-specific guide RNAs togethor with transient linear PCR products as 
HER donors, Here, we optimized this method to produce chromosome fusions. 
“Tofuse hrs I and I, for example we targeted breaks achacen tothe ch I right 
telomere and chr Tf telomere, and neat CENI hy Cas9 co-expression with three 
specific gRNAs (Fig 2a) Wesyntheszed two donor molecules, the frst encoding 
homology to bridge the termini of these chromosomes, andthe second a fragment 
of DNA spanning CEN! to deleteitThrough succesful HR at bth ites we gen- 
trated a compound chromosome IIT with deletion of to telomeres che IR. tel 
tnd chr ILte and one centromere, CENI in a single shot. The choice af gRNA 
telomere target site was chosen to optimize targeting specificity while minimizing 
deletion of subtelomeric sequences o preserve genome content At each stage, We 
Picked plasmid-transformed colonies and then screened by PCR fora) presence 
fapproprattelomere-elomere junctions andb) deletion of spproprat centr 
meric DNA (Extended Data Fig. 1a). 

"The CRISPR-Cas9 method was performed as described". In addition, 
another gRNA acceptor vector was cnstructed ftom gRNA-ura-HYB (Addgene 
plasmid #64330) by substituting the URA3 gRNA with a NotI restriction site 
RS#26 gRNA acceptor vectors have two gRNA insertion sites, ane defined by @ 
‘NotI restriction stand another one bya HindlIf restriction ste, By using these 
twyo gRNAsacceptr vectors up to 3 gRNAS could be co-expressed in yeast cells 
when cotransformed. To inser 20 nt gRNAs, Gibson assembly” vas used to 
‘ssemblea restriction enzyme digested (or example, Nott or HindIll) and gel 
Purified vector together with «60 bp double-stranded oligonucleotides, which 
Consists of 20nt gRNA, and homlogous sequences to the lft and right of 
‘vectors (or example forthe Notl vector, 5-GCAGTGAAAGATAAATGATC-20nt 
gRNA-GTTTTAGAGCTAGAAATAGC3:for the HindIl vector, §'-CTGGGA 
GCTGCGATTGGCAG-20nt gRNA-GTTTTAGAGCTAGAAATAGC-3) Single 
stranded oligonucleotides were ordered from Integrated DNA Technologies and 
tuincaled® by ist denaturing the miature t 95°C for S min and then cooling to 
10°C with a amp of0.17° «The donot LINAS linking two diferent chromosome 
arms together are made by two-step fusion PCR amplified from the wild type 
yeast genome or they weresynthesized. Thse'lnker donors usually have ~ 4006p 
‘sequences homologous peach chromosome arm (sometimes shorter sequences 
were used to avoid repetitive terminal sequences that would no target uniquely). 
“Thesame rule applies for the centromere deletion donors, ~ 400bp lanking cen 
twomere regions were designed on each side. The centromere deletion donors were 
made by Polymerase Chain Assembly, Donor DNA amplicons were sbout 00 bp 
long. The CRISPR-Cas9 method was performed stepwise: 1. We transformed a 
Cas expresing plasmid. 2. Weco-tansformed 50 ng gRNA expression plasmids 
and ~400 ng PCR amplicon donor DNAs into the strain already exptessing C49. 

Both pRS426 and pRS42H carrying 3 RNAS in total were used, because we 
found that using 3 gRNAs gave a higher yeld of correct colonies than ony using 
2 gRNAs. Then ater co-transformation, yeast cells wee plated on SC-Uea-Leu 
(Synthetic Complete medium lacking leucine and ural) with 300 g/l hygramsy 
‘in plates for more than 2 day, Yeas cells were usually grown for4hin YPD afer 
transformation tallow expression of hygromycin resistance, before plating on 
selective plats with 300 g/ml hygromycin B The yeast clones with correct fsion 
chromosome configurations were verified by PCR and pulefeld electrophoresis 
Sometimes the SC-Ura-Leu + Hyg plate selection toa stringent in that very few 
twansformant colonies wee obtained. In that case, SC-Uis-Leu plates were use. 
PCR verification of fusion chromosomes To verify the fusion of two chromo- 
somes, primers were designed to bind outside f homology sequences tthe linker 
donors Only clones bearing succesfully reorganized fasion karyotypes generated 
appcopriately sized PCR amplicons, which were absent from filed clones or wild 
'ypecontral colonies To verity the deletion ofa centromere, primers were designed 
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to bind sequences anking centromere regions. Typically the succesful deletion of 
centromeres produced amplicons ~120 bp shorter than wildtype control samples. 
Al verification primer information is incuded in Extended Dat Table. 
Pulsed-field gel electrophoresis. For n=16 to n=9 strains (Fig, 2), yeast 
chromosomes plags were prepared and separated by camped homogeneous 
electric field (CHEF) gl electrophoresis using the CHEF-DR Il Pulsed-Field 
lectphoresis System (Bio-Rad), as previously described". The CHEF gel run- 
ring condition was 6 Vem, switch time: 60's to 20. run time: 4h, 14°C with 
4105 x The-Rorate-EDTA butler and a 1% gel with le meting pont agarose. For 
/n=910=2 strains (Figs. 2c and 2d),a7 ml cular of yeas was grown over 48h 
frat it reached stationary phase. Then yest call pellets were collected by cen 
tesugation and zymolyae and agarose were added proportion to cell pellet weight 
For i mg of cell pelt, 2425 m/l ymelyase 207 in 10 mM KPOM (pit 7.5) 
and 540410.5% low melting point agarose in 100 mM EDTA (pH 7.5) were added. 
{Lowemting point agarose was uly dissolved in the EDTA solution by microwave 
heating and then kept at 42°C. Cell pellets were resuspended with zymolyase and 
agarose solutions by pipetting with a wide bre pipete and transfered to BioRad 
molds (BioRad Na 1703713) The molds were cooled ina4°Ccald room for 30m. 
“The plage were released fromthe molds and added to 1 ml 500mmM EDTA, 10 mM 
‘ris pH7-5 and incubated at 37°C overnight 8 sarcosyl and S mg/l proteinase 
Kin 00 mM EDTA pit 75 was added tothe plug the nextday and incubated at 
50°C overnight. The plugs were washed with I ml 2mM Tris-lmM EDTA butler 
four times for each, Sach lags could be used immediately or stored at °C for 
atleast a year To separate Mb t03 Mb chromosomes, we used H. ne cheomio- 
somal DNA as marker. H. winget has 7 chromosomes, varying ftom 05 Mb to 
3.13 Mb. For n=9ton-—4 strain (Fig. 20), the CHEF gel electrophoresis condition 
‘was that commended for H.wingei chromosomes (BioRad No.170-3667). To 
Separate chromosomes longer than 3 Mb, we adopted a diferent electrophoretic 
protocol, using S pombe chromosomal DNA asa matker.For 4 to n—2 stain 
(ig, 28), the CHEF gel electrophoresis condition was that recommended for 
S. pombe chromosomes (BioRad No.170-3633). 

Calculation of maximum arm length. The condensation of fasion chromosome, 
che 1V-XII in budding yeast hasbeen studied by labeling TRPY wit red ures” 
‘cence and LYS4 with green fluorescence’ In mother cells, the distance between 
{TRPL and LYS4 was ~0.4yM daring anaphase. With ~470 kb between these two 
loci. extrapolating from this distance predicts that each | Mb of condensed chro 
‘mosome during anaphase will extend ~0 85m (00.4686 Mb). Anaphase 
spindle axis length ean extend up to 10ym, suggesting the maximum distance a 
chromosome arm could extend i um. The maximum arm length limit. therefor, 
should be about £9 Mb (5y.n/0.85ym per IMb). 

Serial dilution assays. Yeast strains were grown fom single colonies in liquid 
‘YPD culture unt they reached the stationary phase at S0°C with rotation. Then 
caltare was diluted to Ago =D01, and srl luted (1:10) in water and pated on 
slferent media. YPG plates were prepaed by adding 3 lycerolas carbon source 
to yeast extract peptone. Allother compounds (HU, MMS, benompi) were added 
to YPD medium, Plates were incubated at 30°C fr 2 day except for YPG plates, 
HU plates and MMS plates, which were incubated for 3 days 

GGrovth curve measurement, Year strains were grove from single colonies in 
liga YPD cular ntl they reached stationary phase at 30°C with rotation Then 
culture was diluted to Avy, 0.05 and grow in 96 wel plates with 100) YPD or 
YPD +0. M hydroxyurea for 48> Every 10m, the plate would be shaken and 
4 measurement of OD600 was taken bya BioTek Eon microplate spectrophoto- 
‘eter Doubling time was measured by calculating the lope ofthe growth curve 
at exponential tage. 

Competitive growth assays. Competitive growth assays were carried out as 
described”. The GEP or dTomato cassette was integrated into the HO locus 
‘through selection of nourseothrcin-resistant cell. The GFPLablled BYA7A1 was 
co-cultured with dTomato-abeled BYA7AL or n—4 in Ll ratio, while the GEP- 
Tabeled BY741 was mised with dTomato-abelled» —2 in 1:25 ratio in YPD. The 
nextday 30,000 total cells were sorted on a Sony SHSODS Cal Sorter as TO. Cells 
Were diluted 200-fold in fresh YPD mediums and sorted every 24h tee times. 
Flowjo v0 was used to analyse the data 

Genome sequencing, Whole genome DNA samples for sequencing were prepared 
using a Norgen Blotck fungi yeast genomic DNA isolation kit (Cat No. 27300) 
‘Whole-genome shotgun braries were made on a Beckman EXP automation work= 
station and prepared as follows: 500 ng genomic DNA, as input, was amplified by 
2 cycles af PCR and sheared to 500 bp ona Covaris E20. A brary was prepared 
‘sing KAPA High Throughput Library Preparation Kit(KK8234), and sequenced 
2875p pated-end reads onan Illumina NextSeq 00, Al ra reads were trimmed 
to remove adaptor sequence using Trimmomalic, and subsequently mapped to 
{UCSC sacCer? reference genome (S288C. reference genome. R-1-1 20080605) 
from Saccharomyces Genome Database using BWA-MEM standard options. BWA, 
Picard, GATK and SAMIools softrare were applied to align reads t transcriptome 
teference and call SPs!Indels The filtering criteria for SNPs using GATE were ss 
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fellows: QD <2. 5S >600; MQ <40.0-MQRankSum <-12.5; ReadPosRankSum 
‘<-80;SOR >40. The tering criteria for Indle using GATK areas follow: QD 
<20; FS >2000, ReadPosRankSum <-20,0,SOR >10,, The variants common 
tothe fused chromosome and the laboratory tock wild-ypestrain BY4741 were 
‘removed and considered as starting spontaneous mutations The remaining vari 
ants were manually curate, by browsing though the bam fies in 1GV. Native 2-ymn 
plasmids were absent from those sequenced fusion chromosome strains n—12 ta 
according tothe WGS result 
[RNA Sequencing Fr each strain, thre independent colonies were grown in YPD 
liguid medium at 30°C with rotation to saturation, The next ay, an overnight cl- 
ture was diluted to Agg)=0.1 and regeown. Cells were harvested at Aya) =04-0., 
‘otal RNA was extracted from three independent log-phase cultures using a 
(QIAGEN RNeasy mini kit (Cat No74106)-Te brary was prepared with 500 ng 
total RNA as input, using a TruSeq RNA sample Preparation v2 kit (set A 
[RS-122-2001 and set B RS-122-2002) with 13 cycles of amplification. The brary 
vas sequenced as 150 bp single end reads onan llumina NextSeq S00. Reads 
‘were mapped to S288C reference genome (sucCer?) and differential gene expres- 
‘on analysis was performed with TopHat and Cull according ta standard 
pipeline. Briely, trimmed reads were mapped tothe reference genome using 
“TopHat and aligned reads with more than ? mismatches were excaded from the 
downstream analysis. RPKM were calculated using cufflinks and differentially 
‘expressed yenes were analysed using cull Genes that have been deleted inthe 
fasion chromosomes were excluded inthe analysis The thresholds we used for 
dliferentilly expressed genes are P< 10-° and [fold change|>2. 
Definition of Isgenc strain. Isogenc, a used here, relers to genomic regions 
not deleted together with centromeres nd telomeres, The deleted regions contain 
41 maximum of 21 verified genes (n-=2 strain) as defined by SGD. and a maximum 
fof 6% ofthe genome inthe most extreme cas, tha i contparing n-2 and n— 16 
‘trans, and consist primarily ofthe centromeres and telomeres themselves, and 
Some adjaceat repetitive subtelomeric elements including X elements,’ elements 
snd highly repeated PAU and COS genes. 
“Mating type switching, Mating type witch of fusion chromosome strains were 
‘atred ott by transforming a CEN plasmid, which expresses endonuclease HO 
ith ts endogenous promoter. as described”. 
DAPI staining of fixed yeast cells. 10” cells were spun dove from an overnight 
culture, Plleted cells were fixed by re-suspending in 70% ExOH and incubated for 
2hinroom temperature. Cells were spun down, washed with water, e-suspended 
in 50041 RNase A solution (2 mg/tal RNase A (Qiagen, 19101), 50 mM Tels 
H7Sand 15 mM NaCl andincubstd for 2h at 37°C. Proteinase K(Lavitrogen, 
25530049) was added toa final concentration | mg/ml and incubated for 45 min 
837°C. Cells wore collected by centrifugation, washed with I mal of SO mM Tris, 
tnd stored in Im of 50 mM Tris at 4°C. 

100-200 of fied clls were pelleted by centrifugation, stained by resuspending 
{in 1001 of DAP! solution (300 nM DAPT in 1 PBS) and incubated for 15 min 
Cells were spun down and pellet was washed with I> PBS, spun down again and 
se-suspended in I PRS for imaging. 

Images were collected with NIS-Elements acquisition software, with 100% oil 
objective lens using Nikon Eclipse Ti microscope in both DAPI channel and bright 
field channel. mages were cropped in NIH Image} software with Bio-Formats 
plagin. 

Sporulation and tetrad dissection. For BY743 diploid background, a diploid 
‘lone was inoculated in YPD at 30°C with rotation overnight. The next day cul- 

ie was dilated in YPD medium to grow at 30°C frat least two generations to 
Aggy =4-8. The cells were pelleted by centrifugation (2,000 for 2m) and washed 
three times with reagent grade water. Cell pellets were re-suspended in sporulation 
saediums (1% potassium acetate and 0.005% zine acetate) with 0.1% (wl) yeast 


cxtract and amino acid supplements (03 mM histidine, mM leucine and 0.2.mM 
traci), ata final Asg= 1.0, and incubated at 25°C with rotation for 3-15 days 
before tetrad dissection. To digest ati 50 cell cultures were pelleted by centr- 
fingaton (3,600 pm » 3 m),re-suspended in 25 0S mg/ml 20T zymelyase in 
{UM sorbitol a 37°C for6-7 m, and diluted wih 300 LM sorbitol. Teteads were 
dlissected on a SINGER Instruments dissection microscope and germinated ot 
‘YPD plates at 30°C for 2 days. L346 was used in tis experiment as the n=8 
strain ater than yIL342 

For crosses of the hybrid SK1 x BY4741 background, diploid strains 
were patched onto a YPG plate from -80°C glycerol stocks and grown at 
30°C overnight (O/N), then patched onto a YPD plate containing 4% glucose 
and grown at 30°C OIN, inoculated into YPD liquid culture and grown at 
25°C OIN, luted to Ago =0.8 and then grown in BYTA pre-sporulation 
‘medium (buflered yeast extract tryptone acetate, 1% yeast extract, 2% bacto 
{uyptone, 1% potassium acetate and SOmM! potassium phtaate) at 30°C with 
‘otation for ~165, Cell pllets were cllected by centilugation (300 rpm x Sin), 
‘washed twice with 25 ml water, resuspended to Ayoi=2.0 in sporulation 
‘medium (0.3% potasium acetate, 5% acetic acid, pH™~ 6.5, with 0.AmM his 
tidine,2 mM leucine and 0.2 mM uracil) incubated at 30°C: with shaking. The 
sporulation elfciency of each strain was measured at 48h on a ZEISS phase 
Contrast microscope under 60x magnification. For each strain, > 100 clls were 
‘counted to obtain the sporulation efficiency. We scared asci with spore and 
Cells that did not go through miosis inthe same category (aei with 0 spare). 
‘Wecalulated sporulation efficiency by dividing the numberof asc with at est 
1 spore to total cells. L346 was used inthis experiment as 8 straia rather 
than ys, 
Statistics and reproducibility. Figures 2b-d, 3a and 4e,d and Extended Data 
Figs. 1 2a and Sa were repeated tice with similar results Extended Dats Fig. 2b 
‘was dane with biological quadrupliats. 
(Code availability. ll codes used inthis study are avilable t tps: github com 
sunnysuns5/fusion_yeast_chromosomes, 
Data availability. We submitted DNA sequences and RNA sequencing data to 
[NCH BioProjct PRINAA7ISI8, All other data are availabe from the enrespond- 
{ng author upon reasonable request. 
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Extended Data Fig. 1| Fusion chromosome paths and characterization. 
a, PCR verification of fusion chromosomes. Two pairs of primers are used 
to verify the fusion of two chromosomes, here presenting th fasion of 
chromosomes | and ll, Only with successful fusion would an amplicon 
show infusion chromosome junction PCR. In addition, the centromere 
region PCR amplicon is shorter, wing tothe deletion of a centromere. 
Marker: 2-log DNA marker. b, Citcos diagrams for different values 

of mare shown in the upper pane, with 16 chromosomes laid out in 
circles in aclockovise manner, Each grey line connecting each pair of 
chromosomes represents a posible intertelomeric link. The dashed 
coloured lines represent the path we chose. (Open circle indicates the 
starting chromosome, arrows show direction of fusion chromosomes.) 
Each fusion chromosome i labelled inthe same colour, providing 
detailed information on fusion chromosome paths, The underlined 
chromosome has the active centromere for the fusion chromosome. 
Unchanged chromosomes are not shown, but the numberof unchanged 
chromosome is indicated after +, for example, +9 means nine unchanged 
chromosomes, Fusion chromosome lengths are shown belov. Please 
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note that the length of the eDNA array (normally 1-2 Mb)is omitted 
here, “Chromosome containing rDNA array In addition, telomeric 

ends are clearly labelled with the original chromosomes; L/R stands for 
left/right telomere, Active centromeres are written on the right ofeach 
fusion chromosome diagram, When we deleted CEN1S and made 4 bp 
‘of mutations in gRNA sites in BY4741 to fuse chromosomes X and XV 
together, the fusion strain grew more slowly than the wildtype. The 
growth defect could be due t altered expression ofthe gene neighbouring 
‘CEN15 (YOROOIW; RRP6). For this reason, CENIS was maintained inthe 
remaining centromere, c, Fusion chromosome paths from 4 to 

Red lettering indicates the chromosome that has an active centromere 
in the compound chromosome. d, Maltiple strategies that we attempted 
to construct an =I strain, We tried changing chromosome arm length 
(strategy 3), centromere position (strategy 2), and which telomeres 
‘remained (strategy land 2) In strategy 2, we attempted both versions of 
n= 1 strains, keeping ether CEN7 of CENIS as the active centromere. 
In addition, we also attempted the strategy 1 in both SIR2* and sir24 
backgrounds. 
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Extended Data Fig.2 | Growth fitness assays. a, Serial dilution assays in 
seven diferent conditions with n ~ 16 through n —4 strains. b, Growth 
curve for BYA741, the n—4 strain and the n—2 strain in YPD medium 


Ypo+ 
YPD+0.2MHU —YPD40.01%MMS_15ug/ml benomy! 


‘Average 
doubling time | By4741 | nat me 
(min) 
Yeo 135 141 182 
yeoso2mHu| 222 208 28 


st 30°C. ¢, Doubling time calculation for these three strains. Each 
experimental group was tested in biological quadruplicate. 
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Extended Data Fig. | Whole-genome coverage maps and SNPs 


compariso 


‘Whole-genome coverage map of BY4741, Chromosomes 


are ranged according to their size. x-axis shows chromosome coordinate, 
and y-axis shows the coverage of reads that map to the reference genome 


‘of S288C. b, Whole- genome coverage 


1p of n=2 train, For comparison, 


‘we mapped the reads hack tothe S288C genome. The main difference is 
either no reads 


telomere reads. Ovving to deletion of telomeres in 


‘or only few reads align back to Fused telomere ends. , Each round of 
CRISPR-Casd experiments takes atleast 70 generations. The nucletide 
‘mutation rate is 0.33  10-* per base per generation". If we multiply the 
‘number of rounds of CRISPR-Cas9 (16 ~ 1)» 70 generations = yeast 
genome size (bp) 12 x 10° 0.33 x 10", itgives the expected number of 
SNP as indicated inthe table. 
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Extended Data Fig. 4 | Chromosome plots of gene expression change in n=2 vs n= 16 strains y-axis, logy (fold change of gene expression 


= 16)) x-axis, chromosome coordinates. Red arrows point to the four remaining telomeres in n =2 strains 
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Extended Data Fig. 5 | DAPI staining of nucleus and sporulation ‘<axie diferent diplotd strains, Sporulation efficiency was measured after 
efficiency for diploids. a, DAPI staining of nuclei in diploids, ‘shifting diploid strains inthe sporulation medium for 10 days in room 

b, Sporulation efliciencies for 2n=32, n= 16, 2n=8 and2n=4 temperature c, Fractions of asi with 0-4 viable spores in heteotypic and 
hhomotypic diploids. y-axis percentage of asc that have 0-4 spores. hhomiotypic ersses 
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Extended Data Table 1 | gRNA sequences and fusion efficiency of each fusion step 


Wumber of PCR 


Fusion step Deletion 20 nt @RNA (5'->91) Deletion coordinate nies venfication 
chr lil right_telomere  GAGCTACTATCTTTGTCGGG 299737 - 316817 
1 chrileft_telomere  CTCAATGTACGCGCCAGGCA 11-6770 10" 4 
CENt AAGAAAGTIATATGTGTGAC 151467 - 151578 
‘chr X_right_telomere  GTTGAGAGACAGGATGGTTA —_43g045 - 439885 
2 chrii_left_telomere  CCATTTGGAGTCTGCTCGGC —1 - 3582 NA NA 
CEN TGAATAAGTTGAAGGAGAAC _355626 - 355853 
chr V_right_telomere  TGCTAATCGTCTCTGCGGTT —_S69244 - 576869 
3 chr VilLlef_tolomere | CCATTTTTTATCCTTATGCT 41-6182 1 1 
CENS TTTAGTTGAAACGCCAACAG _151813 - 152187 
chr VIright_telomero  GAACTGTGCATCCACTCGTT ~—>69509.970148 
4 chri_left_telomere —- GGGTTAACTACGCTATAGAC 11-8703 s 2 
CENG CTAAAACTGTCTITICGTGT __148507-148624 
‘chr Xill_right_telomere  CTGATAAAAGCAGGTGCCCT —_gygg60.924429 
5 chr XIVleft_telomere  TCTGATCGGTCATACGTACA —1-7493 ie 1 
CEN14 CTTCATCAAGATCGGGGAGC _628758-628876 
‘chrXVLright_telomere TGGTGTTATATAGTGGCACC 94191-48062 
6 chriXeft_telomere  TTTGCCACCACCTGGGCGGG 4.20857 4 i 
CEN16 TTAGAATTACGAGAACATAA __555959-556070 
‘chr Vill_right_telomere  CATCCGTGTGCGTATGCCAT —_55g465-562643. 
7 chr left_telomere — TAGTGGCTCGCACTCATGCG ~—1-7960 10 2 
CEN10 TIGTTATACACAACGCGTCT _436301-436421 
chrLright_telomere  ACCTGCGCGGCGCGGCGGTT __227349-230208 
a chr Viloft_ telomere CTTTGTATGAGGGTACATCA —1-6185 48° 2 
‘CENZ CTTIGCGTGGTCTAGTGCAT __238211-238325 
chr Vil_right_telomere — CTTTTACAACAACCGCCATA —4ggg0s9-1090947 
9 chr Xill_feft_telomere — ATAGCGGCGGTACGTTTGTT —1-8585. 20" 7 
CEN13 CACTATTTATITTACTATGG. 268036-268410 
chr IV_right_telomere — TTGTGGTAGCAACACTATCA ——ys94597-4551919 
10 chr Xil_lef_telomere  GTTGTGTTGAGGTACTGTGT — 11-1185 8 2 
CENT2 AGGAGAAAACTTGTAGTACG —_450818-150946 
chr XLright_telomere  CTGTTTAGACACTTGCGTCA 54993-86454 
" ‘chr Vil_teft_telomere  AAAAGGATCTATCTCCCGCT —1-4700 16 9 
CENI1 TATTTAGTATTGGACCATTG __439774-430891 
‘chr X_right_tolomere — CATCCAGATCGGAAACGCTA —_743437-745742 
2 chrXV_left_telomere  AGGTAAGGATACGGGGATAT —_1-5056 2 1 
‘CENB TATTATACTAAATCGTITTG. +108580-105700 
‘chril_sight_telomere — CGTAACGTTGGTTCACATTT —_ 1 4675-813178 
13 ‘chrX|left_telomere  AAATGAAGAAGTGCCATGGG 1.2951 I 
‘CEN3 TAAGCGGAAGGGGAAGGGTT__114487-114389 
‘chr XV_right_telomere  CTGCATCGTCTCGCTITGCC ~—_4073916-1091289 
14 ‘chrIV left_telomere  CTTGCAACATCGCGACAGGT —1-5566 195" Zout of 95 
‘CEN4 TCGGCATTTITGGCCGCTCCT —_44g799.449819 
“Seton pln ar S0-Ura- Le therm ae C-Uo-Lu sy, The same fUSPR-Cae8caprimentas fusion ste 14 mar lo cari out nan = 16 stra bnchground. Asa es 7 elon 
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Extended Data Table 2 | Summary table of fusion chromosome strains 


7 VILSTS___ GhrOSR was fused to chOIL, wih MATa his3A01eu2A0 mettS40 uradao I (cena) 15 
cent deleted 
2 W374 ChrO9R was fused to chrO3L, with MATa his3A0 eu2A0 met15A0 ura3A0 IX: (cen3) 14 
end deleted 
3 L282 _ChrO5R was fused to chrOBL, wth MATa his leu2A0 mett540 ura3A0 KX (cen3) 13 
cenS deleted vevilicen8) 
4 L303 ChrO6R was fused to chO2L, with MATa his leu2A0 met15A0 ura3A0 DXi: (cen3) 12 
cen6 deleted Vevil(cen8) Vi:l(cen2) 
5 yL310 Chrt3R was fused to chrt4L, with MATa his340 leu2A0 met1540 ura3A0 WK: (c@n3) 11 
cent4 deleted VeVil(cen8) Vien?) XII:XIV(cen 13) 
6 L920 Chr16R was fused to chrO8L, wth MATa his3A0 leu2A0 met1540 ura3A0 XVI! 10 
cent6 deleted (cen3) V.Vili(cen8) Viil(cen2) XILXIV(cen13) 
7 yL336-—ChrO8R.was fused to chr10L, with MATa his3A0leu2A0 met15A0 ura3A0 XVII 9 
cen10 deleted (cen3) V.VIILX(cen8) VI:lcen2) XIILXIV{cent3) 
8 L342 ChrO1R was fused to chrO6L, with MATa his30 leu2A0 met540 ura340 8 
cen? deleted XVEIXAELVEN (cen3) V:VIEX(cen8) 
ILXIV(cen13) 
8 L346 ChrO7Rwas fused to chri3L, with MATa his3A0 feu2A0 met1540 ura3A0 XVIII) 8 
cent3 deleted (cen3) Vil(cen2) V-VIX(cen8) VILXIIXIV(cen7) 
8 L358 ChrO7R was fused to chri3L, wth MATa his3A0 leu2A0 met5A0 ura3A0 z 
cent3 deleted XVEDKHEEVEN (con3) V-VIEX(cen8) 
VIxIl:xIV(cen7) 
10 L969 ChrO4R was fused to chr12L, with MATa fis3A0 leu2A0 met?5A0 ura3A0 6 
cent2 deleted XVEIXIMELVLI (cen3) IV-XIcen4) V.VINX(cen8) 
ViXII-xIV(cen7) 
1" W375 Chr1 1Rwas fused to chrO7L, with MATa fis340 leu2A0 met540 ura340 5 
cent deleted XVEIXIEVLI (cen3) IV-XI(cen4) V.VINX{cen8) 
XEVIXILXIV(cen7) 
2 L378 Chr OR was fused to chr1SL, with MATa his3A0 leu2A0 met1540 ura340 4 
cenB deleted XVEDCIIELVLN (cen3) 1V-XIK(cen4) 
V-VIIXXVi(cer45) XL-VIEXILXIV(cen?) 
3 WL381 —ChrO2R was fused to chrt1L, with MATa his340 leu2A0 met?540 ura3A0 3 
end inactived XVEDXIELVEIEXLVILXULXIV (cen7) 1V:xN(cen4) 
V-VIIX:XV(cent5) 
2 yWL410 Chri SR was fused to chrO4L, with MATa fis340 leu2A0 met1540 ura340 2 
cend deleted with parental strain: V-VIIEX:XVIV-Xil(cen18) XVEDKILE-VEI (c2n3) 
yL979 XLVIXILXIV(cen7) 
“4 L402 Chri SR was fused to chO4L, with MATa his340 leu2A0 met1540 ura3A0 2 
end deleted with parental strain. XVIDCIEVEILXIVIXILXIV (cen7) 
wL381 V-VIMX-XV:IV.XINc@n15) 
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Extended Data Table 3 | Variants identified from genome sequencing of different n strains 


T__Gronesone _Postan Roternco Tatton ORF (amine aids tao 
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ssr403 A c ANI (YBRISBW, L317) 
ARTES * c Newco 
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ae rere 5 Teva Tang en 
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Cv 614706 rn c “ADEGYGROGTC, 04226) 
Crk 640070 c n “ATPA(YIRT2I ASAE) 
a ToaTGR c Norrcodng 
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Extended Data Table 4 | Gene expression changesin n=2, 4 and 8 strains 


Not Ne 


GEX2", YRF1-4", DLD3", 
YLRA60C*, AIF1*, YELO7SC, 
‘YELO77C", RMO6", COS7* 


Expression RMDS*, YELO77C*, YLR460C*, AIF 
decreases costo" 


Dios", ‘YURAB0G*, YORSS1C*, 


cos7* 


YKLI87C, SPG4, AADI0", RPSOA, Y.ILOASW, 
‘YLR2B1C, COS8", OYES, HXTS, IMD2" 
Expression HSP12, HM, HBT!, PHOT", SPG1, GRE, 


YER186W", Cost 
vinos2c*, Cosi, cos4", 


imp2", 


YiRo#ac", cos1*, cosa" 


Increases ‘SIP18, YIR042C*, PIR3, FMP45, GND2, Vea ‘YER186W*, IMD2", COS4* 

b 
Gene Systematic name Ne2 Nes N=B 
COS4/VFLOG2W significant significant significant 
YEROSTW/YFROSTW significant significant significant 
YPSS/YGL250W significant significant significant 
IMD2/YHR216W significant significant significant 
YGL258W-A/YGL258W-A significant not significant significant 
GEX/¥CLO73C significant significant not significant 
VBA/YCLOE9W significant significant not significant 
PAUA/YLRAGIW significant not significant rot significant 
FDH1/YOR388C significant not significant * rot significant * 
THISYFLOSEW not significant not significant rot significant 
AADIE/YOL165C Removed by design Removed by design rot significant * 
Fisher's Exact test pevalue= 9.34 x 8-12 pevalue= 1.16 x 0-6 value= 8.68 x 6-7 

c 

Significantly changed Significantly changed in Significantly changed in All genes 
inn=2 strain n= strain eB strain 
‘Subtelomeric genes 18 16 9 325 
'Non-subtelomeric genes. a o 0 6360 

P value (Fisher's Exact test) 2212x010 2.20 x0-16 1.729 x 0-12 


Thomsen 1 can in 20 Conparaan ol ieenresion changer 2] saulamere ene wih changer Sv2a, S33 and Sida cain elfen one aber 
bys YELOGHETELOEEW a ELOSEC ne pte he let acs“ aces Te Pras as alee by monet erste, change vcore 
‘at omerecarepnfeaninn=2¢ ana cae eeitd by nese Pate senna 
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Extended Data Table 5 | Primers for PCR verification of fusion chromosomes 


Fusion step Purpose Forward primer (5-3) Reverse primer (5-3) 

7 ‘chr It junction TGAAAGGCACTGCAACATCTG GGCTCAATCATCTACCGCATGG 
CEN 1 deletion CAACCAAACGTCCTCTTCTCTC ACGATACATGGACTGACTCAAG 

2 chr XH junction TTAAGGTGCGACCGGCAATG CCTGTTTCGGCACTTGAGTC 
CEN 9 deletion CAACGAATTTCTCTCCGCCAGG ~~ CACTTCAACAGTGCCAAAGACTCTAC. 

3 chr V-VilI junction, TACGCCAAGTCGGTCAGGTC TCCGAACTTGGTGTGTCTTCAG 
CEN 5 deletion ACCTCCTAGCACTTCGTAATG GCTATTTATGTGCGGCTTTGTC 

4 chr Viiljunction —-s CAGATCCTTTCGCATTCCTACTTG. TCGTCGATGGTATTGGTGTAGAG 
CEN 6 deletion TTGGGCGATGGAAGAGGTAAAG ACTTTCAACGCAAGAGCAAGAC 

8 chr XII-XIV junction CCAAGACTCTCACCTGCGAC GCAATGGCTCAGTAACCTCG 
CEN 14 deletion CTGATGGACTCCGTAGAGAGC AGGGTAGCATAAACCTGCTG 

6 ‘chr XVIIX junction“ ACATTTGGGCCGTTGCTAGAAG TATAGTCGGGCCTAGTTGCACTC 
CEN 16 deletion GGTTGAAGGAGTTAGTTTGTCG GCCGCTTTGATGATTCTGCTTTAG. 

7 chr VIILXjunetion. GAACCGCTTCTGCTCAACTAG CAATGACGGTGTTCGTGAAGC 
CEN 10 deletion CTCAGAAGGGAATTTCGTAAGC CCAGTTTAGTTGTTGTGGATGC 

8 chr I-VI junction CGTCAGCAGCGTCAGTAACTC GCTGCAACAACTTCCCAATCATG 
CEN 2 deletion GGACTGAAAGCCAGTAACAAGC TTCTCGTACCAAGCCGGTTC 

8 ‘chr Vil-xilljunction _-- AAAGTTTCCACCAGACGCTAAG CTACACTCGAACTCTGTTTCTCTC 
CEN 13 deletion AGGCTTTCGATTACCATGTGC CTAAGGTAGCCAGAACTTCTCATC 

10 chrIV-Xiljunction _ GCGTGACTTCTAAGAACAAGACTGC = ATGGTGAGAGATGGGTGATGGAG. 
CEN 12 deletion GACAACCAAACTGGTGTATGC TGCCATCATCTACTTCCTTTCC 

1" chr X1-VI junction, GCGAAAGCGAAGCCAATGTG GCCATTAGCCTTCTATGTGTTC 
CEN 11 deletion GAACGACATTAACGGATACGCAAC —- TGAAGAAGGTCAACATGAGGATGG. 

2 chr X-XV junction AATGCTGTGACACGCAGATAC GGTACGCTCACCTCGTAAGTC 
CEN 8 deletion ACCCTCAGGTTGCTATGACG ACGCACGAGCGAATTAACATTCC 

6 chr IleX! junction GCTGCAACAACTTCCCAATCATG TTTGCCAACACGAAAGGAACTC 

“ chrXV-IVjunction _ GGTAGTAAGCAACTCGTATCCCTG += TGGCATTCCTCTTTCACTTTCGTC. 
CEN 4 deletion AGTGGTTGACATGCTGGCTAG. GGCCTCAAGAAAGAAACCTCTATG 


‘Centar 3 ar schol the etn aay 3 bp CGS Sane searing waz uaa vary he lation of CE ater han PCR amen 
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Gamete fusion triggers bipartite transcription factor 
assembly to block re-fertilization 


Aleksandar Vjestica', Laura Merlini*, Pedro Junior Nkosi! & Sophie G. Martin! 


‘The ploidy cyele, which is integral to sexual reproduction, requires 
meiosis to halve chromosome numbers as well as mechanisms that 
ensure zygotesare formed by exactly two partners! During sexual 
reproduction of the fungal model organism Schizosaccharomyces 
‘pombe, haploid P and M cells fuse to form a diploid zygote that 
immediately enters meiosis. Here we reveal that rapid post- 
fusion reconstitution of a bipartite transcription factor blocks 
re-fertilization, We first identify mutants that undergo transient 
cell fusion involving cytosol exchange but not karyogamy, and show 
that this drives distinct cell fates in the two gametes. The P partner 
undergoes lethal haploid meiosis, whereas the M cell persists in 
‘mating. The zygotic transcription that drives meiosis is rapidly 
initiated first from the P parental genome, even in wild-type cells. 
‘This asymmetric gene expression depends on a bipartite complex 
formed post-fusion between the cytosolic M-cell-specific peptide 
Miand the nuclear P-cell-specific homeobox protein Pi*”, which 
captures Mi in the P nucleus. Zygotic transcription is thus poised to 
initiatein the P nucleusas fast as Mi reaches it after fusion, a design 
that we reconstruct using two synthetic interactors localized to the 
nucleus and the cytosol of two respective partner cells, Notably, 
delaying zygotic transcription—by postponing Mi expression 
or deleting its transcriptional target in the P genome—leads to 
zygotes fusing with additional gametes, thus forming polyploids and 
eventually aneuploid progeny. The signalling cascade to block re- 

fertilization shares components with, but bifurcates from, meiotic 
induction*"®. Thus, a cytoplasmic connection upon gamete fusion 
leads to asymmetric reconstitution of a bipartite transcription 
factor to rapidly block re frtlization and induce melons ensuring 
genome maintenance during sexual reproduction. 

‘While studying fusion-defective mutants! we discovered an inter- 
esting post-fusion asymmetry between mating partners. Mating mix: 
tures of cells that lack the p2!-activated kinase Pak2 (also known as 
Shk2) showed many unfused partners and a prolonged actin fusion 
focus lifetime'? (Fig 1a, Extended Data Fig. 1a), pak2A\ matings also 
produced about 10% aberrant asci, apparently either derived from 
three cells and containing more than four spores (Fig. 1a, type Illa), 
or derived froma single cell that underwent sporulation (Fig. 1a, type 
IIb). Three lines of evidence revealed that these asc ae the result of| 
meiosis and sporulation taking place in haploid cells. 

Fist, ive-imaging showed that aberrant asi were formed upon mating 
of partners that had not undergone karyogamy. The spore-forming 
partner displayed along microtubule bundle—characteristic ofmeiotic 
prophase'*—followed by two rounds of spindle formation, indic 
ative of meiosis | and meiosis I! (Fig. 1b, Extended Data Fig. 1b, 
Supplementary Video 1a). The other cell maintained interphase 
‘microtubules and the ability to mate with another partner, thus yield- 
ing type Ila asci, Second, fuorescent labelling ofa chromosomal locus 
revealed that spore-forming cells contained only two loci distributed 
between up to four spores, which indicates aneuploidy (Extended Data 
Fig. 1c). By contrast, every wild-type spore contained one fluorescent 
locus (Extended Data Fig. 1c; see also ref. "). Thitd, flow cytometry 
analysis of DNA-stained spores showed pak2A spores with sub-1C 


DNA content (Fig. 1e)."Thus, aberrant pak2A asci form upon meiosis 
and sporulation ina single, haploid cel 

Haploid meiosis was provoked by transient cell fusion: pak2A cells 
that produced aberrant asc initially fused and exchanged cytosolic 
GFP, but theie fusion pore resealed, as indicated by persistent unequal 
levels of GFP between partners (n>20, Fig. 1d, Extended Data Fig. 1d, 
Supplementary Video 1b, c) Transient fusion was required to induce 
haploid meiosis because fully blocking fusion in formin fus1"=!6 
‘mutants prevented aberrant asci formation (Extended Data Fig. 1f) 
ak2 did not regulate meiosis directly, as pak2A diploids sporulated 
normally (Extended Data Fig. 1g). In crosses between wild-type and 
fus1 cells—which exhibit partial fusion defecs!™!*—we also observed 
transient fusions followed by haploid meiosis in one partner, whether 
this was fus1 or wildtype (Extended Data Fig. If hi, Supplementary 
Video 1d). We conclude that, independent of genotype, transient cll 
fusion induces meiosis in one partner cell. This is consistent with 
observations that the merging of M and P eytosols induces meiosis 
independently of ploidy or karyogamy'". 

‘Transient fusion induced haploid meiosis strictly in the P cell (in 64 
cout of 64 cases for pak2A, and in 9 out of 9 cases between fusi A and 
wild type). In transiently fusing pairs, the master meiotic regulator 
Mei3!*" was induced only in P cells, as assessed by tagging mei3 with 
a fast-folding GFP variant (s{GFP)" (Fig. 2a, Supplementary Video 5). 
M genome mei3-sfGFP produced fluorescence only upon complete 
fusion (Fig. 2b, Supplementary Video 22). Importantly, mei3 was also 
asymmetrically expressed in wild-type zygotes: Mei3-s{GFP expres 
sion was significantly more rapid when encoded in the P genome 
than in the M genome (Fig. 2c, Extended Data Fig. 2a, Supplementary 
Video 2b). Thus, zygotes first express the meiotic inducer Mei3 from 
the P genome. 

‘Mei3 expression is under the regulation of two cell-type-specific 
factors, the P-specific homeodomain protein Pi and the M-specific 
42-amino-acid peptide Mi*”. We found that Pi and Mi formed 
complexes insensitive to nuclease treatment both in vivo and when 
expressed in bacteria (Fig. 2d, e). Mi co-expression also stabilized Pi 
levels (Fig. 2d). Chromatin immunoprecipitation experiments further 
showed that Piand Miassociate with the mei3 promoter, but only when 
both proteins are present (Fig. 2, Extended Data Fig, 2c). Thus, the 
Pi-Mi complex—but not its individual subunits—binds chromatin and 
induces mei3 expression. 

To test whether Pi and Mi govern the asymmetric mei3 induction 
from parental genomes, we swapped their coding sequences. This swap 
produced a phenotype reversal, with Mei3 expressed first from the M 
genome (Fig. 2g, Extended Data Fig 2, Supplementary Video 3a) and 
transient fusion inducing haploid meiosis in M cells (in 13 out of 14 
transient fusions, Extended Data Fig. 2d, e, Supplementary Video 3b). 
The swap did not affect mating and sporulation efficiencies nor spore 
viability (Extended Data Fig. 2f-h). Thus, asymmetric zygotic mei3 
transcription is governed by Miand Pi, wit the Pi-expressing genome 
inducing mei frst. 

‘We visualized Piand Mi tagged with s{GEP at their N- and C-termini, 
respectively, expressed from their native genomic loci. Mi-sfGFP 
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Fig. 1 | Transient cell fusion induces haploid meiosis and ancuploidy. 
4, Quantification (right) of shown phenotypes (lft) in homothallc wild 
type and pak2A matings, *P=4.81 » LO (n 3 experiments, with 

‘n= 500 cel pairs each), Welch’ test. WT wildtype. b, Homothallic 
‘pak2A mating cells expressing GFP-o-tubulin (green) and nuclear 
‘Uch2-mCherry (magenta). Note the lack of karyogamy in the outlined 
‘pak pair though one partner forms meiotic spindles (arrows) and 
spores (arrowhead), while the other maintains interphase microtubules. 
Extended Data Fig. 1b shows wildtype, Flow cytometry analysis of DNA. 
In spores produced by wild-type and pak2A matings (n= 10,000). AU, 
arbitrary units. , Transient fusion in pak2A matings observed as Pcell- 
expressed GEP enters into the M cell followed by fusion pore sealing 
(arrow) and spore formation (filled arrowlseads) inthe P cell. The fasion 
focus (empty arrowhead) is labelled by M cell Myo52-GFP and P cell, 
‘Myo52-tdlfiomato, Extended Data Fig. 1d shows wildtype. 


produced a faint cytoplasmic signal in early mating M cell, Notably, 
fusion induced rapid Mi-siGEP accumulation inthe P nucleus fol 

lowed by a delayed M nucleus entry (Fig. 3a, Extended Data Fig 3a, 
Supplementary Video 4a, b) In transiently fusing pak2A cells, Mi 
SIGFP accumulated in the nucleus of the P cell—which underwent 
haploid sporulation (Fig. 3, Supplementary Video 4e)—but notin the 
‘M nucleus. Thus, Mi nuclear accumulation requites a P-cell-specific 
factor that remains asymmetrically distributed upon transient fusion 
‘Three lines of evidence showed that this P-cell-specfic factor is Pi. 
Fist, Pi was nuclear sfGFP-Pi produced a very low-intensity signalin 
both P cell and zygote nuclei (Extended Data Fig. 3b, Supplementary 
‘Video 5a). Pnnuclear localization was more evident in fasion-defective 
{fus1A cells (Extended Data Fig. 3c, Supplementary Video 5a) or 
‘upon overexpression in interphase cells (Fig 3)- Second, cytosolic 
‘Mi-mCherr that was overexpressed in interphase cells accumulated in 
the nucleus upon s{GEP-Pi co-overexpresson (Fig 3). Mi-mCherry 
expression also led to increased s{GEP-Pi levels (1.49 + 0.42-fold, 
n=20, P=7:95 x 10-7; Kruskal-Wallis test), as observed above (see 
Fig. 2d), Finally, deletion of Pi prevented Mi-sfGFP nuclear enrichment 
after fusion (Fig. 3d, Supplementary Video 5b) 

‘We investigated the causes of Mi asymmetric nuclear localization, 
“The asymmetry reversal associated with the Mi-Pi swap indicates that 
asymmetry is independent of other cell-type-specific factors (Fig. 2, 
Extended Data Fig, 2d, e). The 5-kDa Mi peptide may diffuse more 
rapidly than its 19-KDa partner Pi, However, tagging Mi with 27-kDa 


a a | 
‘Tai ovea sre 


° 
2 
“wet a ey 
“ Ff 

i 

é 


—_ ge 


EP am em 55NDe 
= 


Fig, 2 | The metotic inducer Mei3 ix rapidly induced from the genome 
of cells expressing homeodomain factor Pi a, GEP-tagging of P genome 
‘mei leads to Mei3-x{GEP expression in the pak2A spore-forming P cell 
(arrowhead), but notin the M cell > 10 cell. b, GEP-tagging of M 
{genome mei3 shows no expression during transient fusion of pak2A M 
and P, clls—which produces spores (arrowhead) —but trong expression 
‘when Mand P, cells completely fuse. > 10 cells c, Mei3-sIGEP signal 
from zygotes of heterothallc wild-type cells with met3 tagged in only 
‘one parental gename. Time 2era is cell fusion, visualized by transfer of 
‘cytosolic mCherry. Solid lines represent average; shaded areas include two 
Standard deviations, *P=1.39 x 10-*, Kruskal-Wallis test. , Western 
blot of cell ysates (input) from interphase cells overexpressing sfGFP-Pi, 
‘Mi-Halo or both, immunoprecipitated with GEP affinity beads (GEP IP) 
and treated or not with benzonase. Tubulin served as loading control. 
[Note Mi co-purification with Pi and Pi stabilization by Mi co-expression, 
¢, Coomassie-stsined SDS-PAGE and western blots of benzonas-treated 
lysates (input) obtained from Escherichia cli co-expressing MBP-Pi and 
either Mi or Mi-6His, purified with cobalt resin (His pull down). 

£, Chromatin immunoprecipitation (ChIP) for endogenous s{GEP-tagged 
Piand Mi, inthe presence or absence ofthe binding partner, analysing 
‘met3 promoter and off-target region asociation, Extended Data Fig. 2e 
shows replicates. g, MeiS indaction upon mating of cells with swapped 
[Mi-Pi. Data processed asin c. “P= 1.8 x 10->, Kruskal-Wallis test. Gels 
are available in Supplementary Fig 1 
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S{GEP did not abrogate asymmetry, which suggests that asymmetry 
{snot simply reliant on protein size. We thus investigated the possible 
contribution of Piand Mi localization tothe nucleus and cytosol. We 
‘mimicked these differentially localized interators with the synthetic 
htero-specific short coiled-coils SynZip3 and SynZipt™, the former 
fused to mCherry the latter to s{GFP anda nuclear localization signal, 
cach expressed in a separate partner. Upon cell fusion, the cytosolic 
‘marker accumulated in the partner's nucleus immediately, whereas 
‘equilibration ofthe nuclear marker was delayed —mimicking Miand 
Pibehaviours (Fig. 3, Supplementary Video 6).Such asymmetry was 
even more notable upon transient fusion (Extended Data Fig, 3d, 
Supplementary Video 6). No asymmetric localization was observed 
inthe absence ofa nuclear localization signal (Extended Data Fig 3, 
Supplementary Video 6). Thus, asymmetry is inherent to the distinct 


jtic regulators Pi and Mi localize asymmetrically in early 
zygotes. a,b, M-cell-specific cytosolic Mi-s{GFP rapidly accumalatesin 
the P nucleus ater partners fuse, and only subsequently inthe M nucleus 
in wild type (a) and pak2A (b). Mi-s{GEP accurmlatesin pak2A P, 
rucleus but notin M nucleus upon transient fasion. The punctate cortical 
signal is background —peobably mitochondrial—fluorescence. Right 
panels quantify average (lines) and two standard deviations (shaded area) 
Df nuclear Mi-s{GEP Ina, *P-—1.79 > 10", Kruskal-Wallis est. In, 
P< 897 x 10-*, Kruskal-Wallis tet, , Exponentially growing mei3 
cells overexpressing s{GFP-Pi, Mi-mCherry o bath proteins, Note 
puclear Mi-mCherry upon Pi co-expression and higher fGEP-Pi sigoal 
‘upon Mi co-expression, d, Zygotes produced by Mi-sfGFP-expressing M 
cells and wild-type (top) of Pi (bottom) P cells. Uch2-mCherry labels 
nuclei e, Mating of wild-type nuclear localization signal (NLS)-s 
SynZipl-expressing and mCherry-SynZip3 expressing cells. Upon Fusion, 
the cytosolic red fluorophore rapidly accumulates in the partner's nucleus, 
whereas nuclear green luotescence exchange is delayed. Some vacuolar 
red signal does not readily exchange. Right panel quantifies nuclear 
signals, and is presented asin a. *P=3.39 > 10°, Kruskal-Wallis test, 


subcellular localization of Mi and i, 
twanscription from the P genome, 

‘In summary, cell fusion reconstitues a bipartite Pi-Mi transcription 
factor. iin the P nucleus rapidly traps cytosolic Mi(Fig. 4d). The i-Mi 
complex then binds the mei3 promoter and induces Mei3 expression 
from the P genome, Mei3 expression from the M-genome i postponed 
owing othe delayed exchange of P between partners, 

"Toaddress the role of rapid Mei3 induction, we delayed Mei3 expres 
sion in two ways. We first built a transcriptional delay by placing Mi in 
Mcells under control of aP-speciic promoter, such that Mis expressed 
only after successful fusion. This delayed Mei3-sfGEP expression 
by about 30 min (Extended Data Fig 4a, Supplementary Video 7a). 
Second, we simply deleted P genome mei3, such that Mei3isexpressed 
only from the M genome and thus with an approximately 15-min 
delay (Fig. 2c). Though neither approach affected mating or spor 
lation efficiencies upon fusion of otherwise wild-type cells (Extended 
Data Fig, 4b), both prevented haploid meiosis upon transient fusion of 
‘pak2 mutants. Instead, pak2A partners persistently attempted to form 
a zygote (Fig 4a, Extended Data Fig. 4c, Supplementary Video 7b). As 
expected, deleting mei3 in pak2A M cells did not affect haploid meiosis 
in the P cell (Extended Data Fig 4d, Supplementary Video 7). 


design that promotes rapid mei3 
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Fig. | Rapid induction of Mei3-Mei2 signalling suppresses mating in 
zygotes and prevents polyploid formation. a, Mating of pok2A M cells 
expressing Mi from P cll specific p™™* promoter with cytosolic GEP- 
labelled P cell. Transient fusion, observed as GEP exchange and signal 
build-up in only one partner, never induced meiosis (n= 10). The cells 
shown persistently mate until complete fusion, followed by sporalation, 

b, Zygote fusing with an additional partner upon mating of wild-type P 
cells with M cells expressing Mi under p™" promoter. Note formation of 
spores in the now-triplod zygote Mating of mei3A P cells and GFP-n- 
{ubulin-expressing M cells. Note fusion of tree partners, formation of 
mlotic spindles (arrows) and spores (arrowheads). d, Model forthe 
‘symmetric localization ofthe bipartite Pi-Mi transcriptional activator in 
carly zygotes, which results in fast P genome mei3 expression followed by 
Meid activation to block re-ferilization, 


‘Remarkably, delays in Mei3 expression in otherwise wild-type crosses 
led some 2ygotes to engage additional partners (Fig. 4b, c, Extended 
Data Fig. 4e~g, Supplementary Video 8) (n= 18 out of about 7,000 
events for both delay in Mi expression (n=4 experiments) and mei3 
deletion in P cells (n ~3 experiments), with P values 0f 0.033 and 0.031, 
respectively, compared to wild type; Welch test). Multi-partner 2ygotes 
entered meiosis—as shown by microtubule labelling—and formed 
four, probably aneuploid spores (Fig 4, c, Supplementary Video 8). 
Polyploid 2ygotes were never observed in wild type (n> 10,000 
events) and very rarely when mei3 was deleted from the M genome 
(2 of n>7,000 events (1 =3): P value= 0.42 compared to wild type; 
Welcis test). Thus, the rapid onset of Mei3 expression from the zygotic 
genome prevents the formation of polyploid zygotes, and conse 

quently the formation of aneuploid spores. 

‘A larger fraction of zygotes (about 1%) fused with additional part 
ners upon mei3, Pior Mi deletion in homothallic (self-fertle) strains, 
and these predictably did not enter meiosis (Extended Data Fig. Sad, 
Supplementary Video 9). In addition, >30% of zygotes exhibited 
growth that was suggestive of mating behaviour (Extended Data Fig. 5b, 
Supplementary Video 9). Mei alleviates repression of the RNA-binding 
protein Mei2*, which—in complex with meiRNA RNA encoded by 
the sme2 locus*—induces meiosis by promoting the expression of the 
meiotic forkhead transcription factor Mei4”"** Deletion of mei2 or 
‘mutation of its C-terminal RNA-recognition moti resulted in about 
10% muli-partner 2ygotes (Extended Data Fig. 5, f Supplementary 
Video 9; ee also ref. "), Approximately 90% of mei2Amei3A double 
"mutant zygotes grew mating projections, and about 16% fused with 
a third partner (Extended Data Fig. 5g, Supplementary Video 9). By 
contrast, mei4/\ and sme2A aygotes never engaged additional partners, 
though both arrested before spore formation (Extended Data Fig 5, i). 
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‘We conclude that the re-fusion block depends on Mei2 signalling — 
probably through RNA-binding, but not with meiRNA. Thus, mating 
repression in 2ygotes shares components with, but bifurcates from, 
meiotic induction. 

‘Mechanisms preventing re-fertilization ensure ploidy maintenance 
across generations in evolutionarily divergent phyla. These mechanisms 
generally involve rapid changes, suchas the release of cortical granules* 
and shedding of surface receptorsin mammals!, membrane depolari2a- 
tion in amphibians'® and the degradation of pollen guidance cues in 
plants" Our results provide a first glimpse into fungal re-fertlization 
blocks, by showing that yeast zygotes rapidly initiate transcription to 
discontinue mating. Whereas Mi peptides are fast-evolving peptides 
that we could only identify in the Schizosaccharomyces lineage by 
,genome position, Pi-like homeodomain proteins are ancestral eukar- 
Yotic proteins that are extensively used in developmental processes”. 
‘Their primordial role may be in the haploid-to-diploid transition’, 
‘hich suggests that ther rapid post-fusion activation is used elsewhere 
toblock re-fertilization. For instance, rapid post-fusion homeodomain 
complex formation and nuclear translocation has previously been 
observed in Chlamydomonas reinkardtir® 

‘The bipartite design of the Mi-Pi transcription factor—which is, 
reminiscent of hormone nuclear receptors and their ligands"—favours 
post-fusion activation speed, limited only by the rate at which Mi 
reaches the nuclear-localized homeodomain Pi protein. Because this 
simple two-component system inherently leads to asymmetric zygotic 
expression, asymmetry may follow from the selective pressure to rap- 
idly block re-frtlzation, As even a transient cytoplasmic connection is 
sufficient for formation of an active complex, such transcription factor 
design may also be used to impart cell fate changes upon—or build 
sensors to monitor—other instances of transient cytoplasmic bridge 
formation. 


Online content 
‘Any Methods, including any statements of data availability and Nature Research 
‘reporting summaries along with any addtional references and Source Data fle, 
sreavallabein the enline version ofthe paper at hrtps-//doorg/101038/41586- 
nis owr-s, 


Received: 24 March 2017; Accepted: 25 June 2018; 
Published online § August 2018. 


1. Bianchi, E, Doe, B., Goulding, D.& Wright, G1 Juno is the egg zuma 
‘aceptor and s essential or mammalian feriizaton. Nature 508, 482-487 
(ania) 

2 Volz R, Heylaut J Ripper D, von Lyncker La Grof-Hardt Ethylene 
signaling required for synergid degeneration andthe estabishment ofa 

Beinn be tok Dy cat a, 310-316 O19) 

lechmiann, A. Alar, 5. & Dresselhaus, The begining fa seed regulatory 

‘macnanisms of double erblzatan, Fant Plt Se 5, £52 (2018). 

4. Cheeseman, LP. Boulanger, Bond, LM.& Schuh M. Two pathways regulate 
Cortical ranula raniceton to prevent poiypermy mouse oseytes. Nat 
{Commun 7,13726(2018), 

15, Marini L-Busin O& Martin. S. Mat andfuse: how yeast ells dit Open 
#13, 130008 2013) 

6. als. Burke, Sm, M. Kiar. & Beach, D. Four mating-type genes 
antol sexual ferentiation inthe fission yeast EMEO J. 7,1537-1547 
(88), 

7. Wil, ta Two-stepactiration of meiosis bythe mat locus in 
Schaasaccharamyees pombe. Mol Call Sa 15, 4964-4970 (1995) 

8 Watanabe ¥-& Yamamoto, M.S pombe mai" encodes an RNA-binding pots. 
serial fr premeiotc DNA syathesis and miosis | which cooperates witha 
novel RNA species me RNA, Cal 78, 627-498 (1999), 

9. Watanabe. ShinozakYabana, . Chikashge ¥, Hiraoka, ¥.& Yamamoto, M. 
Phosphorylation of RNA-hnding pron contralselleycle switch From ric 
tomeloten fission yeast Nate 386, 187-190 (1997) 

10, LLP McLeod. M Molecular mimiry in development: entfeation of se 11 
sca subevate and mea: 22.3 peoudecubstrate nor ant hina. Call 
£97, 859-880 (1999), 


11, Dudin,0, tal systematic sree for morphological abnormalities during 
fission yeas sonual reproduction erties a mechanism of actin aster 
formation frcel fusion PL Genet 13,e1006721 (2017). 

12, Dudin, 0. eal ormin-nuciated actin str concentats cel wall ydralaces 
forcelfusonin fission yeast Ce iol 208, 897-511 (2015), 

13. Ding 0.A rush nour towards sexual reproduce: the chremasome dynamics 
frig lost, Chin Se Gu $6, 3900-2503 2011). 

14, Ding,0.Q, Yamamoto, A, Haraguchi, T& Hiraoka, ¥ Dynamics ot hamelogous 
chrorasome paling during meiotic prophasein fission yeast. Dev Call 
525-341 (2009), 

15, Yamamoto, 1G, Chlashige, ¥,Ozoe F_Kawamula M.& Hiraoka, ¥ Actuation 
athe pheroraneresponsie MAP hcase dnves hapa els to undergo 
ctpie moos wth normal elomere clustering and sister ehramats, 
Segrogation i sin yest J Ce e117, 3875-3886 2004). 

16, Petersen, J, Weigun. D. Ego, F. & Nilsen, 0, Chracterzation af us] of 
Schzesacchaomjces pombe: 2 developmental corral ureion needed for 
‘conjugation bl Cel Sol 15, 3697-3707 (1995). 

17, Pola S. Bene, 2, Zhang & Gregan J. Mad the Schiaaccharomycer 
pombe homolog of EB. requived for kryogamy and for peometing 
(sllatory nuclear movement dunng meiosis, CellC 13, 72-77 (2014). 

18, Yamashi A, Fujta 7 & lamamota, M. Proper microtubule structure ital for 
timely progression trough mons in fision ysst PS ONE B, 65082 (2013), 

19, Guts, “Twin meiosis" and ther amialences inthe ie cycle of 
Schgesaccharumyces pombe, Ssence 158, 796-798 (1967), 

20, Mcleod, & Beach. Aspecicnhbtar ofthe ran” prot kina epultes 
entry inte meiosis in Schzesaccharomyces pombe Nature 382, 509-514 
(a88), 

21, PcelacgJ-D, Cabantous, Tan, T, TerillgerT.C.& Walda,6.S. 
Engineering and characterization of supeilcer green Muorescentpotsn. 
at Giotechnol 28, 79-88 (2008). 

22, Reinke, A.W, Gran, RA & Keating A E Asyntheticcold-allintractome 
Provides heterospecfic modules fo molecular engineering. Am, Char Sec. 
{i32, 6025-6031 (2010) 

2, MurakamicTonam, et aL Midp coordinates the onset of meiosis by 
ragulatingede25" in fission yeast Proc Net Acad So USA 104, 14688-14683 
(2007), 

24, Mata, Wiley A & Baler. anscriptioal regulatory network for sexual 
ferentiation in sion yeas Genome Bia 8 R217 2007). 

25, ‘fe LA Fat Boek to poyspermy in sea urchin eggs ecrially mediated, 
Nature 261. 68-71 (1976). 

26, Beale KM. Leydon. AR. &Johnzon MA. Gamat fusion is required to black 
‘ultple plas tubes rom entenng an Aabiopss owe Cure il. 22 
{030-1038 2012) 

27. Derele,R-LopezP Le Guyadar & Manuel, M.Homeodomain proteins belong 
tothe ancestral molecular took ot eukaryotes. Eval Der 9, 212-219 (207). 

28, Bowman. JL, Sskalebar,K, Furia Ditech. Evoltion inthe 
tele afi Annu. ev Genet 80, 133-154 2016). 

29, Lie 1-H, Lin Hoo, § & Goodenaugh U Early sual rg ot ereoprotein 
heterdimercaton and evolution afte plant KNCX/BELL tary. Cel 133, 
5-840 (2008). 

20, Erans RM. & Mangeledot, D1 Auclasr receptors, RAR, andthe big bang. Cll 
457, 255-266(2018), 


‘Acknowledgements We thank M. Buhler T Kuzdere,F Bendeni S. Pelt, 
Ie Mare & rcangiol and G. Thon for strsine and experimental advice, anc 
Banton, T Andersen, © Grud, 5. Min, J-W. Veening anc members of the 
Martin laboratory for manuscript suggestions. A EMBO long-term fellowship 
OAV, ERC Consolidator Grant (Celfusion) and Swiss National Scence 
foundation gant (310034 155944) fo SGM supported this ware 


Reviewer information Nature thank S. Grea. Heitman ane. Niseen for 
‘hele conributon tothe peer review ofthis work. 


[Author contributions SG. and AV. conceive the project, designs 
experiments and wrote the manuscript AV performed experiments, wth PN. 
‘seating with pullcowns ana LM wth constructing strain. 


‘Competing interests The authors declare no competing intrest. 


‘Additonal information 
Extended data i valle fortis paper athtips://de.org/10.1038/s41586- 
ole 0407-5, 


‘Supplementary information saviable for this pope at htps//co\.ore/ 
ToNOsa/=41 586-018-0407 5 

Reprints and permissions information callable a hit://www naturecom/ 
reprints 

(Correspondence and requests for materials should be adress to SGM. 
Publisher's note: Springer Nature sans neutral with regard jurisdictional 
claims in published maps and institubonalatfiitions. 


12018 Springer Nature Lined llrightsreserved 


METHODS 
[Nastatisical methods were used to predetermine sample size. The experiments 
Were not randomized and investigators were not blinded to allocation during 
experiment and outcome asessment, 

Growth conditions. The growth conditions used fr experiments ae detalled in 
ref" and the overview of experimental procedure is presented in Extended Data 
Fig. 6a In bret, feeshy streaked clls were inoculated into MSL + N medium®* 
and incubated avernght at 25°C with 20 rpm shaking to exponential phase. The 
following evening cultures were diluted to OD,:n=0.025 n 20 mlof MSL + N 
‘medium and incubated overnight at 30°C with 200 rpm agitation to exponential 
phase, Experiments on exponentially grvring cultures were performed a this 

Point. For time-lapse imaging of mating, homothallic cells or 1:1 mixtures of 
tcrthalicclls were pelleted for one minute at 100yin mictocentiag tubes 
and washed tree times 1 ml of MSL-N medium, Cells were then dled in 
3mlof MSL-N medium to final OD, 1.5 and incubated a 30°C with 200 rpm 
agiation for 4-6h, Finally cells were mounted onto MSL-N agarose pad, covered 
with coverslip and the charaber was sealed using VALAP (vaseline, lanolin and 
paralfin, 11:1), 

For low cytometry and quantifictions of mating and sporulation eiciencies, 
and mating and sporulation defect, ~3 > 10? of MSL-N washed cells were re 
Suspends in 20 of MSL-N medium and spotted onto MSL-N 2% agar plates and 
Incubate t 25°C forthe indicated numberof hous. The numberof uamated cells 
tunsparulated zygotes and sporulated zygotes wae determined sing transmitted 
light microscopy, and mating and sporulation efficiencies were calculated using 
the following formulas: 


2m, 


Mating eficiency (5) "0 _ 190 
is ai anand calls + Maygaies 
Sporuation ficiency (3) = Mevaeon  99 


“The reported results are from thre replicates, error bars denote standard devia 
tion and P values were calculated using two-tailed Welch t-test assuming normal 
distribution. 

Diploid strains. The diploid stains wore constructed from haploid P and M cells 
in which mating type svitching was abolished due toa HIA17 mutation inked to 
natMX and kanMX selection cassette, respectively. Cells ofthe two mating types 
‘were mated for 24 hon solid ME medium a 25°C and then plated on YE solid 
‘medium containing both nourseohricin and kanamycin. Colonies resistant to 
both antbioties were subjected to Hoechst staining followed by lw cytometry to 
‘entity diploid strains Presence ofboth matl-Pand matl-Mloci was confirmed 
by diagnostic PCR. 

Flow cytometry. We collected the mated cll suspensions from MSL-N plates 
and estepended sample in | mlof MSL-'N medium containing 10, of lusulase 
(NEEIS4001EA, Perkin-Elmer). Ater overnight incubation at 30°C we visually 
‘verified thal cells except spores had sed upon glasulas treatment. The samples 
were washed thre times wth MSL-N and resuspended in 3 mlof water containing 
2-5jigiml of Hlocehst 33342 DNA dye (82261, Sigma-Aldrich) and incubated for 
15-30 min at room temperature. Samples were immediately analysed on Becton 
Dickinson Fortessa with proprietor software platform using the 355-nm laser ine 
with 450/50 filter The experiment was run in duplicate 

Steans, ll strsins are reported in Supplementary Table Standard fission yest 
senetcand molecular biology tols were ured in the stady®, Sequences of mutants 
obtained in this study are provided as annotated, GenBank-formatted sequences 
tvalable at hitps:/figshare.com/s/6d81d779a8550b2caec6 (supplementary 
Sequences 1-27) 

"The mating loci of strains used in the study are illustrated in Extended Data 
Fig. 6h, Mi and Pideletions remove the ORF ofthe genes. sfGFP tagging of Pi and 
-Mivras done atthe N-and C-terminus, respectively. 

Inhomothallic strains, the mat! locus determines the cell mating type and 
switches between sequences encnded bythe silent mat2 and mt loci (reviewed. 
inet), Importantly, a HAI? mutation at the mat! locus H-homelgy bax 
preven mating type swtching™As depicted in the Extended Dat Fig 6b, inthe 
synthetic mat! locus sequences, the wild-type 14l-homology box was replaced 
vith the one amplified from the PBY strain caeying the H1A17 mutation (agit 
from B.Arcangil,Instu Pasteur) The recombinant DNA also contains selec: 
tion cassette and homology arms tha target the construct to the mat! locus of 
hhomothalic cells, The described sequences were cloned and amplified inside the 
ppueScrpt plasmid, Finllythe indicated restriction enzymes (see supplementary 
‘sequences I-A) were used to extract the fragment of interest and transform it into 
hb wild-type cells, Correct integration was veriied by PCR and/or inability to 
perform mating type nvtching, 
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smat2and matic tat reside heterochromatic DNA an thus cannot be 
tareed by homologs recombination in wd-type cela were manipulated as 
dered nv hn aeevicw ofthe expecta procure pete 
Extended Dats Fig 7a-The ma locus arpultions were performed in he TPL 
rn (oi rons Thon, Univers of Copenhagen), wheal thc pee, 
tallow recombination athe mai2 locas Furthermore the TP sin crries 
tide and vt east in the vicinity of the mat lcon. The TPL stan ws 
traniormed wih DNA fragments carrying the designed ma2ocusand fang 
ald ipeanqrences $A etn lente soto were sec nd teed 
foecerect integration by PCR. The mat2 mata leas was hen introduced in 
otherwise wtp cal via geet rons at manipulations were say 
performed nthe POSORY stein agit fom G. Thon), which ack the ci gene 
{nears the uno caste inthe proximity of the mat lacs- The sequences 
ofthe tot nat and atc az provide (ae supplementary seqdents 5-8) 

or overepresson we used ihe constutve ah promber comprising 
1000 bp upstream of the srt codon to drive expression of FI Ntrminally 
tagged with GFE. Thiscontroct war cloned into pasmidsthat contain he bod 
dingyeast ADH anscpinaltorminabr sequence BX selection eases 
nd eqencestareting intgetion a either ade oh locus sopplementary 
Sequences 9, 10) Pre ig ado targeting pss nd Stl digest ss 
tegen pls were tanaermed in ale aS mutant cal especie 
Prottrophiezeocin-resant clones were cet and checked fer eet ine 
{ation PER and gene crosses 

Tori owerespresron we sed eer the dh othe mt! prometerto die 
expression of Mi that was terminal tagged with iter Hala or Cherry 
“Thee constructs were oned into plasm that tare iter ue or locus 
and contain kanh IX or ma slection ase, respectively (supplementary 
feuencs i, 12) digested ro argting plasm and RatZI71 etd > 
tap pla yes nde crvpendng atopic tin nd r- 
totopic ones vith adequate anibiocestance were sete and checked for 
corm inition by PCRand gence. 

inc mutant background Wa sed to prevent indation of meiosis co- 
expresion ef Miand FP" as wate eae or conlstans xpesing nia 
penn 

For i expression from the P-cell speci promoter we used 2063 bp 
upstream ofthe mp3 start codon flowed bythe Ri OR sequence the bud 
ding yas ADH transriptional terminator sequence and BleMX selection. This 
Construct was done nse the pK plasmid tha targets the constr the 
in lcos ater Nd dgetion (npplemetary sence 3). The ineatize pls 
ind transformed ith cllslacking nda GAldrestant transformant 
thatefcienl spouted when crowed with wild-type alo was ected. The 
sane traniovnatin ne was used to ntrodue the pok2AhphIX mutton 
2Sdscribed blow 

pak tat stein won dese rm of For the pak 2 hgh we 
clonedthe 180 bp of¥ UTR pa followed by 399bp ofthe 3 UTR of pek2int 
thep Fass AphMX vector which rested an Atte tween the two UTR. The 
obtained plas supplementary sequence 1) ws inetd with A, raps 
toemedin clan ygromyein Besant colonies were selected andes for 
correct integration by ER. 

ime skanNX and mish MX mutant strains were derived from the 
S.pombe deletion brary sain S7B02 and S20408 especie (Boner) and 
were vere y PCR 

neice and meitScune+- mations were derived fom apa National 
Besos Project tens FY7257 and F736, respecte. 

urna ws generated wang methodology dete in ef. In bie 
primers with 0-bp homology to sequences immediately anking the fast ORF 
{sce Supplementary Table 3) were ned to PCR amplify the ma ection cs- 
Sttttom he prAna posi and tansforned intel Nowsothin 
restan clones were sete and genotyped by CR. 

For constrction of mei point mst we st ned the eon sequence 
313-672 downstexm of ne? top codon} allowed theme gene sequences 
ftom 487 bp upstream af the tart codon to 312 bp downstream of he sop 
Codon), which vests an Al resrieton enzyme Been the agments 
‘This construc yas inserted inthe AS hphM plasmid and we need the 
FeIAA mutation by stedctd mutogeness supplementary equenes 116) 
Linearization ofthe plasmid wit fel tgtsthe construct to repace the genomic 
tne gene upon tanformation The hygromycin Brest clones were selecied 
Ind tbe cone! ineraton replacing the wil ype mei was confirmed by PCR 

‘The strains ys Gnorecetiy abled groom loca were dered mn 
FYL5708train bined fom the pan National Bioenurce Projet In these 
tellthe locus genetical inked othe LacO sequence ray that bindshe 
GGP-Lact expressed fom the dst promoter tthe his locus 

Tor microtubule visualization wus cls with SVD promt driven xpes- 
sionefGFP-n-tbulin® deed oma 'C1234sin agli fm Changs group. 
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For mel3 fluorophore tagging, yeast codon optimized s{GEP (a gift from 
‘M. Knop, Heidelberg Universty was introduced into a pFABa-natMIX vector to 
obtain the pEAGs-s(GFP-natMX vector (supplementary sequence 17). The plasmid 
‘vas then used asa template for PCR with primers that amply GFP fllowed by 
‘at cassette an carry 80-bp homelogy to sequences immediatly Nanking mei3 
STOP codon (see Supplementary Table2), The PCR product was transformed into 
‘wild-type cells, ad nourseothricin resistant clones were selected and genotyped 
bbyPCR. All mei3-s(GFP expressing strains used inthe study were derived fom 
the same transformation clone by genetic crosses 

‘For C-terminal mCherry tagging of nuclear marker uch2 and spindle poe body 
marker popl we used the pEAGa-miCherry-natMX and pFAG«-Cherry-kan MX 
plasmids respectively. In both cases, we claned the 3" UTR region ofthe gene fl- 
lowed by C-terminal fragment of the ORF keeping iin frame with the Buorophore 
(supplementary sequences 18,19) The obtained plasmids to tg uc2 and pep! 
were then trated with Afel and Afel + SnaBl restriction enzymes respectively 
sind the digested DNA was transformed into cll. Clones resistant to either nour 
‘cahricin or G18 were selected and genotyped by PCR. 

{FP and myos2-tdTomato were introduced from strains FCAST and 
respectively 
“Theconsruct ta expres the green and red fluorophores from the P-cell specific 
‘map3 promoter were previously described inset, For the expression of GFP fom 
the M-cell-specifie mam2 promoter wecloned the 438 bp of promoter uptzeam of 
the start site into apRIP-GEP plasmid that contains the urad+ selection cassette 
(supplementary sequence 20) Transformation ofthe wa-mutant cells withthe 
Pavel digested final plasm targeted the construc othe native mam2 promoter 
and contered growth in absence of uracil. 

‘Sequences encoding small interacting SynZip peptides!” were a gift from 
5 Pelt, University of Lausanne. The SynZip3 and SyaZip4 peptides were cloned 
at the C terminus of mCherry and s{GFP, respectively. Expression of mCherry- 
SynZip3 was driven from the p™ promoter consisting af $22 bp upstream of 
theact! start eadan. The tGFP-SynZip4 was driven tom the p™” promoter 
‘comprising 49 bp upstream ofthe i start codon, For the NLS-sfGEP Snip 
‘we introdaced the SV40 nuclear localization signal tthe s{GEP N terminus and 
‘used the p™” promoter comprising 496 bp upstream ofthe th start codon, 
All constructs were integrated i the plasmid that caries the uraf+ selection 
‘casette and targets the wa locus. All constructs are detailed in supplementary 
sequences 21-23, The Afel linearized final plasmids were transformed into urad 
‘mutants and uracil prototrphs were selected for further experiments 

"Toreplace the mei coding sequence wth mCherry we designed a plasm that 
‘upon restriction digestion with Paci and Spi prodacedsinarfegmentin which 
405 bp upstream of mei tart codon were fused to mCherry sequence and pEASa- 
IhphMx backhone sequence followed by 101 bp downstream the mei3 stop codon. 
‘This fragment supplementary sequence 4) was puriied, transformed into wild- 
type cells, and hygromycin-B resistant clones were selected, Correct integration 
replacing the wild-type mei3 was confirmed by PCR. 

"To simultancously label mating M celle with green fuorescence and P cells 
with red fuorescence, we cloned into plasmid to target the ade locus supple- 
‘mentary sequence 27): 1,75 bp ofthe muon! promote, with a point mutation in 
the Ae st, infront of GFP coding sequence followed by budding yeast ADIL 
‘uanscrptional terminator. the blasticdin-S resistance gene, and 2,063 bp of map3 
promoter driving expression of mCherry fused with another copy of the budding 
yest ADH transcriptional terminator. Pmel linearized plasmid was transformed 
{nto cells blasticidin resistant clones were selected and cores integration con- 
firmed by PCR. 

For DNA extraction to checkefliciency of nuclease treatment, call sates were 
‘mined wit sodium-dodecy/-sulate (1% final concentration) and ether sodium 
cla (300 mM, final concentration) orlthium acetate (250 mM final concentra- 
‘Non) and incubated at 70°C for 10min. Ate adding two volumes f ethanol and 
vortexing, the DNA was pelleted by centrifugation a 4°C for 15 min at 13,00 
Pelts were washed with I mi of70% ethanol, air-dried, resuspended in TEbufler 
snd incubated at 50°C fr one hour Samples were analysed on agarose gels con- 
taining SYBR Safe (533102, Thermo Fisher) 

Protease inbibitors mix used in biochemical purifications inched 100 mM. 
PMSF, 15.6 g/ml benzamidine-HCL, 0.5 ug/ml antipain, 0.5 jig/ml aprotinin, 
(0.5 pg/ml chymostatin, 0.5 g/m leupetin, 0 peal pepstatin A, 0.5 ug/ml 
phenanthroline and a commercial inhibitor cacktal (11836153001, Roche) 
CCo-immunopreciptation, We grew 100 ml of met3A mutant cell cultures overex 
pressing Mi-Hlalo or s{GFP-Pi or both to exponential phase and collected cells by 
centrifugation for one minute at 1,000 rf, Subsequent steps were performed on ice 
unless otherwise indicated. Cells were transfert to 2-m microcentrifuge tubes, 
washed tice with ice-cold PBS buffer (NaCl 137 mM, KC12.7 mM, NasHPO, 
10 mM, KH.PO, 1.8 mM, pH 75, protease inhibitors mix) and resuspended in 
Si lofthe PBS buller and ~1 ml of acid-washed glassbeads were added Calls were 
Iysed using the FastPrep-24 bead beater (MP Biomedical) set to 4.5 mis with 10 


cycles 20-sbeating and 40 cooing one, The beads were removed and samples 
transferred to anew tube in which the cll debris was pelleted by centrifugation 
for 15 min at 13,000. We collected the supernatant and determined the protein 
‘concentration using the Bradford assay. We adjusted the samples to 5 mg of total 
protein in 800) lof PBS bute added I ylof TMR-Halo Ligand (C8251, Promega) 
fd kept samples protected from light hereafter. Each sample was split in balf nd 
fone aliquot was treated with 2 U of benzonase (E1014, Sigma-Aldrich) for one 
hour at room temperature on a rotator stand. Twenty miceolitesf the sample was 
used for DNA extraction (se belo) and 40 for west bloting analysis (Fig 3d 
labelled as input) The samples were then incubated with 10 dof GEP- Trap beads 
(gma-20, Chromotek)prevashed in PBS butler. After one hour incubation ona 
rotating stand at 4°C beads were washed two times in PBS ber and two times 
{in CXS butfer (50 mM HEPES, 20 mM KCL 2 mM EDTA; 1 mM MgCl, pH 7.5 
protease inhibitors) with each wash performed in anew microcentrifuge tbe. 
Beads were then re-suspended in Laemmli bufe, heated 0 70°C for 10 min and 
‘then subjected to SDS-PAGE (XPLO205BOX, Thermo Fisher) Mi-Hal signal was 
recorded using the Cy3 chtanel on the Amersham Typhoon System with propr- 
tor software (General Electric) and the gl was then bloted onto nitrecelulose 
‘membrane using the Towbin transfer bulfer (25 mM Tris-base, 192 mM glycin, 
20% methanol, H8.3) Alter blocking in 5% mill, membrane was fist probed 
‘vith at-GEP antibody (Cat No, 1181460001, Roche; 3,000 dilution) followed 
bbysecondary HRP-coupled anti-mouse antibody (W402B, Promega; 3,000 dia- 
‘on, developed and visualized onthe Fusion FX (Vier. Subsequently the sare 
‘membrane was probed with the TAT-1 antibody (agit rom K Gull, Univesity 
‘of Oso) targeting tubulin as loading contro. DNA was extracted as described 
‘using lithium acetate and analysed on agarose gel (Extended Data Fig. 8a)-The 
experiment was reproduced in thee independent replicites. 
‘Co-purifcation of recombinant Piand Mi Purification of PLN terminally tagged 
‘with either MBP, GST o GFP al fled ovring to the instability ofthe protein in 
both # col cells and post purification ata not shown). However, co-expression 
‘of Mi geal stabilized Pad thus we proceeded o purify the recombinant Pi-Mi 
complexes. The expression plasmids usd (supplementary sequences 25,26) are 
‘Serived from the pSAL backhone and use the inducible acl promote to drive 
‘expression ofbicistonic mRNA encoding MAP-Pi allowed by two top codons, an 
{ntemal bosom entry site (IRES) and ether Mi or Mi-6%Fis, Wester bloting 
showed that there was no considerable translational read-through between the 
(ORFs of Prand Mi. Coding sequences ofboth Pi and Mi were codon optimized 
for expression in Ecol 

‘An Ecol BL-21 strain carrying the expression vector yas inoculated i 25 ml 
‘of LB medium with 100 pg/ml ampicilin and incubated overnight a 37°C with 
shaking t 200 rpm, The next morning cultures were diluted in 25 ml of medium 
10 OD,iy =0.2 and allowed to grow to ODsq=0.8 before diluting them again 
{150 ml of medium to ODjq)~0.1 and incubated at 37°C with shaking until 
(OD jy reached 06. At this point isopropyl-n-1-thiogalactopyranoside (IPTG) 
‘was added to final concentration 150 Mand cells incubated for 6h at 18°C with 
200 rpm shaking. Cells were then collected by centrifugation at 3,000 ref for 
15min, washed in 25 ml of buffer L (30% glycerol vy 0 mM This HCL pH 75, 
100 mM KCL, | mM imidazole) and cll ples fozen in iguid nitrogen 

‘All subsequent steps were performed in presence of the protese inhibitor 
‘mix described above. Frozen cll pellets were thawed on ie and re-suspended 
{in 8 ml ofbufler LPI (50 mM This-HCL pH 75, 10 mM KCl, | mM imidazole, 
protease inhibitors mix) and sonicated on ice with Sonoplus HD2070(Bandelin) 
845% power outpat for S cycles of 30-s pulses fllowed by 30-s pauses. Lysates 
‘were shifted to microcentrifuge tubes an cel debris removed by centrifugation 
‘15,000 rc for 15 min at4°C. Lysates were then pooled together ina 1S-mal 
falcon tube and Bradford asay wae ured to measure protein content. Protein can- 
centrations ofall samples were equalized wit bufler LP, 100-1 aliquots saved 
{or DNA extraction and the remaining sample treated with 2 KU of benzonase 
{or two hours at rom temperature with rotation. Anaher set of 1004 aliquots 
‘were collected to extract DNA (Extended Data Fig, 8b) and 100-p aliquots were 
set aside for probing by western blotting (Fig 2e, input). Each sample was then 
‘mixed with 20 jl of HisPur cobalt resin (59964, Thermo Fisher) peewashed in 
buffer LPI and samples were incubated with rotation at 4°C for 4, Beads were 
pelleted by centrifugation at 60 rf for? min at 4°C and washed tice with 10 ml 
of butler W1 (30% glycerol vy, 50 mM Teis-HCL pH 75, 100 mM KCl, 10 mM 
{midazoe) and tice more with 10 al of ber W2 (30% glycerol vy, 400 mM 
[NaCl 50 mM Tris-HCL pH7.5, 100 mM KCl. 10 mM imidazole) For elution the 
‘beads were incubated with butfer E (30% glycerol yy, 50 mM Tris-HCI pH 7.5, 
100 mM KCI, 250 mM imidazole) Euates were then resuspended in Laemmli 
bulfer and used for SDS-PAGE with proteins visualized through Coomassie 
staining and western blotting, For primary antibodies we used 11,00 dilution of 
anti-MBP antibody (2396, Cell Signaling) and 12.000 ant-His antibody (34660, 
(Qiagen). Infrared Buorophore coupled secondary antibodies a :500 dilution 
(8.05061 and ROS054, Advansta) were used and blots visualized wit the Fusion 


FX (Vilber). DNA was extracted as described above using sodium acctate and 
analysed on agarose gl (Extended Data Fg. 8). The experiment was performed 
in biological triplicate 

(Chromatin immunoprecipitation. Mating between fission yeas cells is aya 
chtonous and thus we performed the chromatin immunopreiptation on diploid 
cells. Because Pi and Mi induce m3 only in early meiosis, we prevented meitic 
progression by replacing the mei3 ORE with that of mCherry. These adjustments 
resulted inthe majority of cells responding to nitrogen starvation by strongly 
inducing mCherry expression from the mei3 promoter ina P-and Mi-dependent 
‘manner (data not shown)- We used cel expressing untagged Mi and Pi to deter 
‘ine the background DNA binding. The ChIP testing Mi binding to the mei3 
promoter was performed on diploid cells carying Mi-sIGEP at the mat-AMlocus 
Whale the matl-Plocus ether encoded wild-type Pi or lacked its ORF. Likewise, 
totest Pi binding othe mei3 promoter we used diploid cells expressing GFP-Pi 
from the mat-P locus while the matl-At locus ether encoded wild-type Mi or 
lacked its ORF, 

“The experiment was performed in biological triplicate. Cells were groven at 
30°C overnight to exponential phase in 50 ml of EMM2 mediuns™ and then 
<ued in 100 ml of EMI medium toa nal ODja9=003. After 12 hincubation, 
cells reached exponential grovth (OD, = 06) and were collected by ceatri 
‘gation at 1,000 rc for 3 min. Cell pellets were then washed 3 times with 20 mal 
of EMM-N medium (EMM lacking ammonium chloride) and resuspended in 
50.ml of EMMN to final ODgyi=1. The suspensions were incubated for 10h 
{25°C and shifted to S0-mal centrifugation tubes. To cross-link proteins and 
DNA, weadded 1.3 ml of 37% formaldehyde solution (F877, Sigma-Aldrich) 
and incubated at room temperature for 15 min om a tube role. Cros-linking 
‘was quenched with 2.6m of 25M glycine followed by 5 min incubation at room 
temperature ona tube roller Cells were then collected by centrifugation at 1,000 
reffor 3 min at 4°C, washed twice with 15 ml of ie-cold PBS buffer transfered 
to 2-mal screw cap tubes and cll pelts frozen in guid nitrogen for 30 min. Nest, 
samples were thawed on ice and resuspended in 500 lf freshly prepared lyie 
butler (50 mM Hepes/KOH pH7 5,140 mM NaCl, mM EDTA, 1 TitonX-100 
(0.1% Na-deoxycholate, I mM PMSF and inhibitor cocktail (11836153001, 
Roche)). After adding 0 5-mm zirconium beads nearly tothe meniscus, celle 
were lysed using the FastPrep-24 5G bead beater (MP Biomedicals) at 65 mls 
With 3 cycles of 60-s beating and 5 min cooling on ie. Crude lstes were ans 
ferred to 15-ml polystyrene falcon tubes, and the sample volume was adjusted 
to~L.S mil hefore sonication using Bioruptor Pls system equipped with a4°C 
water bath (Diagenode). We performed 3 rounds of sonication: Each sonication 
round consisted of 10 cycles of 30s high-intensity sonication, 30s pause and was 
followed by one 10-min pause on ice. Samples were shifted to microcentrifuge 
tubes and cell debris pelleted by centrifugation at 13,000 ref for min at 4°C. 
Supernatants were ranfered to new microcentrifuge tees and cleared by 13.000 
ref for 15 min at4°C. The DNA concentration yas measured in each lysate by 
(QuBit fluorometric quantitation (Invitrogen) and all lysates adjusted to same 
DNA concentration The samples were then incubated with rotation fr 8h at 
4°C with 30 lof magnetic GFP-Trap beads (gtma-20, Chromotek) prewashed 
thee times in PRS batfer with 0.02% Tween. Next, beads were pelleted by cen 
tefugation at 100 ref for 1 min at 4°C and immobilized using the magnetic rack 
before discarding the supernatant. Beads were then subjected to three washes 
with 1 ml offeshly prepared ice-cold lysis buffer one wash with 1 ml of freshly 
prepared ice-cold wash bulfer (10 mM Tris-HCl pH £0, 250 mM LiCl, 03% 
NP-40, 0.5% Na-deoxycholate,I mM EDTA) and ane wash with I lof ce-cold 
‘TE buffer (10 mM Tris-HCl pH 80, 1 mM EDTA), Each wash was performed 
at room temperature with tube rotation that lasted 5 min, Beads wer eluted by 
Incubation with 125 of TESbufler (10 mM Tris-HCl pH'&0, 1 mMfEDTA, 156 
SDS) at 65°C for 10min with agitation. Supernatant was collected andthe beads 
slated once more with 125 pl of TES butler. The two elutes were pooled and 
incubated overnight at 65°C with agitation to reverse the formaldehyde-induced 
croslinking Samples were cooled down and then incubated with yl of 10 mgt 
RNase A (10109169001, Roche) for 1h at 37°C and subsequently with 3 lof 
20,g/ml proteinase K (3115879001, Roche) fr Lh more t 37°C. We added 15 
‘5 M NaCl 300 yo isopropanol and 30 of DNA-binding AMPure XP beads 
(Beckman Coulter o each sample and incubated samples at room temperature 
for 15 min with rotation, Beads were pelleted by centrifugation at 100 ret for 
{mint 4°C, immobilized using the magnetic rack and washed twice with 500 
tof $0% ethanol. The beads were briefly dried and DNA eluted with 20 of Tes: 
HCLpH80, Obtained DNA was diate twotold and used as template in eal time 
PCR with primers at final concentration 03 iM and SsoAdvanced Universal 
SVBR Green Supermic (172-5274 Bio-Rad) performed on 7500 Fat Real-Time 
PCR System (Applied Biosystems). Primers used in the PCR are listed in the 
Supplementary able 3). The gPCR was done in duplicate 

‘Data normalization was performed based on actin promoter binding and 
the binding observed in cells that do not expres the affinity tag is considered as 
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"unspecific ands se to 1 Thus fld enrichment forthe locus of interest (lo) yas 
calculted as 


pent ache 


inwhich AG.=C™ — ACH and Cis the threshold eye, Reported values for 
cach of iological triplicates are averages of two technical replicates. Position of 
the midale nleoide ofthe gPCR amplicon in eerence othe start codon was 
used in the graph. Because the amplitude of binding of the Pi-Mi complex tothe 
‘mei promoter varied between biological replicates, we presen the dts separately 
Microscopy and image analysis, Imaging of mating cell was performed as 
described", We note that the frequency of e-fetlizaion events ih mei3, PA 
and AGA stains appeared to increas farther with time than the numbers reported 
butthis increase could not be quantified owing to deterioration of the imaging 
chambers 

“To acquire imagesin Fig, 3c and Extended Data Fig, 1eweused a pinning-disk 
confocal sytem tha ses an inverted microscope (DNIIS0008; Leica) equipped 
with an HCX Plan Apochromat 100x/1.46 NA ol objective and an UtaVTEW 
system (PerkinElmer; inclding a real-time confocal scanning bead (CSU22; 
‘Yokagawa Eletrc Corporation} solid-state laser lines and an electron-multi- 
plying charge coupled device camera (C9100; Hamamatsu Photonics). tacks of 
series confocal sections were acquired at 3-um intervals using Volocty software 
(Perkin Elmer 

‘Allother micrographs were obtained by wide-field microscopy performed on 
a DekavVision platform (Applied Precision) composed of a customized inverted 
Imiroscope (IX-71; Ohmpus},a UPlan Apochromat 100x/1-4NA oilobjectve. 
camera (CoolSNAP HQ2;Photometis) anda colour combined unt laminator 
(Ansigh S87; Socal Science tnsght). Images were acquired using sf¢WoR 
‘2 software (Applied Precision), Principally we imaged asingle z-section with 
the exception of data presented in Fig. 1b and Extended Data Fig, thin which 6 
sections with 0.5m spacing wete acquired and the presented images te a mai 
um projection of images deconvoled sing softWoRk .1.inbuilt modal. All 
conclusions based on imaging data were derived from at less thre independent 
abservations 

Image processing and fluorescence intensity based quantifications were per- 
formed in tmage} (NIH). Supplementary Videos were converted from TIFF to 
[MOV format using the inbuilt MPEG. compression. For overnight ime lapses 
‘hat exhibited deing presented images wer aligned using the MultiStackRegl AS 
Pligin. All uantiistions were performed on ra images 

“The lifetime of the fusion focus was measured on time-lapse images as the 
interval between the fluorescently labelled marker proteins ist focalizing® and 
cell fusion Results are reported wih the box-and-whiskes plot in which centre 
lines show te medians, bs ints indicate the 2th and 75th percentiles, whiskers 
‘extend 15 ines the interguaril range from the 2th and 73th percents outers 
ae represented by dots and the Kruskal-Wallis P vale i reported 

“To distinguish th expression of me3 from P and M genome we performed 
crosses between heterohalic strains in which only ane partner carried the mei3- 
{GFP allele, and the other partner had the unlabelled, wild-type gene 
Quantification of Mei3-s(GFP fluorescence induction used cll fasion a time 
zero, evidenced as the first ime point with exchange of cytosolic RFP between 
Partners. The mean signal in individual partners betore fasion, and thus before 
‘mei induction, was ensidered as background signal. Mean fluorescence ofthe 
Whole zygote was recorded over time and the average ofthe indicated numberof 
‘ells reported wit shade regions denoting mean + standard deviation, The 
post-fason time for the mean zygotic intensity to reach 50 arbitrary units above 
Background was ecorded for each ygoe andthe results are eported with thebox- 
nd-whisers plot in which entre lines show the medians, ox limits indicate the 
25th and 7th percentiles, whiskers extend 15 times the interquartile ange from 
the 25th and 75th prceniles,oliers are represented by dots and the Kruskal~ 
Wills P values reported 

For Mi-s{GFP nuclear entry dynamics we used the last tme-point before 
exchange of Mi-s(GEP between partners as ime zero. The nuclei of the zygote 
‘were lary identifiable based on Mi-sGFP signal alone, which enabled usto 
butline the nuclear region equired in these quantfications. The signal inthe 
P cllbeforecellfasion was considered as background. Mean Qucescence of each 
nuclear region was ecordd overtime and the average ofthe indicated numberof 
cellsisrepoted wit shaded regions denoting mean © standard deviation andthe 
Kruskal-Wallis Pvalueisreported. 

For quantification of dynamics of NLS-s(GEP-SynZipt,s{FP-SynZipd and 
saCherry-Sy1Zip3 we used the lst time-pint blar exchange af ed fuorophore 
between partners as time zero, The nuclear region was outlined onthe basis of 
the accumlation ofthe Nuorescent signals in the central el regio. The sgoal 
in the partner cell lacking the flaorophore before cll ason was considered as 
background signal Mean fuorescence ofeach nuclear region was recorded over 
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time and the average ofthe indicated numberof cells is reported with shaded 
regions denoting mean + standard deviation and the Kruskal-Wallis P value s 
reported. Note that the m Cherry-SynZip3-expressing cell also displays back- 
‘round vacuolar fluorescence, probably owing to pre-fusion degradation of ome 
‘iCherny-SynZip3 

Statistical analysis, Exact sample size (n) i reported foreach quantification. 
Conclusions were drawn on experimental triplicates, except for Extended Dat 
Fig 1f, which was performed quantitatively only once but qualitatively several 
times), while Fig. Le and Extended Data Fig. 2h were done twice. No clls were 
«excluded from the analyses, but zones of mult layering were excluded from quan- 
tifiations because individual cll could not be tracked reliably. Measurements 
‘were taken from distinct samples. To calculate P value for samples we assumed 
followed normal distribution, we used the two-tailed Welch test calculated by 
Excel ETEST( function, We also report Kruskal-Wallis P values (p*-") that 
\were obtained by Matlab ranksumi) function set for two-sided comparisons of 
samples All Pvalues are given as exact values except for Extended Data Fig. 48 
In which P values are grester than 0.0, No Bayesian analyse was petocmed No 
hierarchical and complex designs are present. No expic estimates of elect set 
have been performed, yet all data ae shown in absolute or normalized values so 
fiat sizes ate evident tothe reader 

‘Reporting summary, Further information an experimental design is availabe in 
the Nature Research Reporting Summary linked to this pape. 

Data availability. The datasets generated and/or analysed during the current study 
ae availabe from the corresponding author on retsonabl request. 
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Extended Data Fig. 1 | Transient cel fusion results in ectopic meio 
in the P cell. a, Box-and-whiskers plot (see Methods) reports the 
lfstime ofthe fasion focus visualized bythe indicated fuoraphores in 
hhomothallic wild-type and pak2A mating cells Kruskal-Wallis P value is 
reported, b, Wild-type control for data presented in Fig. 1b. Time-lapse 
of homothallic wild-type mating cells expressing GEP-0-tubulin in green 
sed len mes Uehiro i agent Noe yop ie 
butlined mating pair, meiotic spindles (arrows) and spores (arrowhead) 
 Mrograph of cls in mating mtstres of horothalc wid type and 
‘pak2A strains with js! chromosomal loci labelled by LacO:GEP-NLS- 
Lact system in green and nucle visualized by Uch2-mCherry in magenta 
Arrows point to spores that lack the ly lacus. , Wild-type control for 
data presented in Fig. 1d. Note the formation of the fusion focus (empty 
arrowhead), labelled by Myos2-GEP in the M cell and Myo52-tdTomato 
in the P cell, followed by exchange of cytosolic GEP expressed from the 
P-cel-specific p™ promoter and spore formation (Full arrowheads). 
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«, Time-lapse showing mating of homothallicpak2A cells expressing GEP 
from the M-cel-specfic p=" promoter. Note thatthe GEP exchange is 
{allowed by fusion pore closure and build-up of the GEP signal only in the 
‘M cells (arrows), whereas P cell form spores (arrowheads), P and M 
cells with indicated genotypes were induced to mate on solid medium. Two 
{ays later, we quantified frequencies of phenotypes indicated inthe insets 
of Fig. 1a, n> 200. g, Quantification of sporulation phenotypes of wild 
type and pak2A diploid cell upon nitrogen starvation, n—3 experiments 
‘with = 500 cells each. b, Time-lapse showing transient fusion between 
wild-type P cells and GFP-expressing fas! M cells. Persistent GEP signal 
difference between partners indicated fusion pore sealing Note spore 
formation (arrowhead) only inthe wild-type P cell i, Time-lapse showing 
‘mating of h-+fustA mutant P cell and GFP-expressing wild-type M cells. 
Note that transient fusion between indicated partners is followed by 
recealing ofthe fusion pore, as visualized by continued accumilation of 
(GEP only in the M cell, and sporulation (arrowheads) inthe P cell 
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Extended Data Fig, 2 | Induction of the meiotic inducer MeiS occurs 
first from the genome of the cell expressing the homeodomain protein 
Pia, b, Mei3-sIGEP signal increases more rapidly after fusion when 
expressed inthe cell expressing Pi. The left panels show individual traces 
that quantify the fluorescent signal from zygotes of heterothallic wild-type 
() or Pi-Mi-swapped (b) cells in which only one partner has mei3 tagged 
ith fGEP.as indicated (see Fig 2c, for average curves) Curves were 
‘ligned to fusion time defined by the entry int the M cell of cytosolic 
_mCherry (expressed in the P cell) Box-and-whiskers plots in the bottom 
panels (see Methods) analyse data shown inthe top panels, with Kruskal- 
Wallis P value displayed. Right panels show an example time-lapse of 

the cele used inthe quantification, c, Biological replicates of chromatin 
mmunoprecipitation of Piand Mi reported in Fig. 2. d-e, Time-lapse 


Be 2 
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showing mating cells with swapped Pi and Mi cading sequences. In both 
‘cases the P cell expresses cytosolic mCherry under control ofp" 
promoter. d, Heterothallic pak2 strains, — 14. e, Mating of otherwise 
wild-type M cells with fusi A P cells, n=5. Transient fusion, visualized 

by lorescence exchange, leads to haploid meiosis in the M, partner 
(arrowheads indicate spores). The P cell proceeds to mate (e) or attempts 
to mate () with the Ms cell f, g, Quantification of mating and sporulation 
efficiencies in heterothallic wildtype and strains with swapped Pi and Mi 
‘coding sequences indaced to mate on MSL-N agar plates for 28h, 

1, Spores in asci produced by mating heterothallic wildtype or strains with 
swapped Piand Miwere micro-dissected and the numberof spores that 
developed colonies were counted. 
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Extended Data Fig. 3 | Meiotic regulators Pi and Mi exhibit asymmetry 
in localization in early zygotes. a Time-lapse showing mating of 
heterothalic cells expressing nuclear marker Uch?-mCherry (magenta) 
with the M cel co-expressing sfGFP-tagged endogenous Mi, Note the 
cytosolic Mi-sfGEP signal in M cells that rapidly accumulates in the 
Pruclews after partner Fusion. The punctate signal at cell cortex in the 
green channel is background probably mitochondrial fluorescence, 

b, N-terminally sGFP-tagged endogenous Pi exhibits a weak nuclear 
Staining in P cells during mating in homothalic cells co-expressing 
nuclear marker Uch2-mCherry (magenta). Arrows point to nuclear 
S{GEP-Pi signal in the P cell (middle panel) and the zygote (bottom panel) 
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«Time-lapse showing homothallic, fusion-defectivefus1A calls 
‘o-expressing N-terminally fGEP-tagged endogenous Pi and nuclear 

‘marker Uch2-mCherry. Arrows point to Pi nuclear accumulation. 

4, Time-lapse shoving transient Fusion between pak2A\ cells expressing 
NLS-s1GEP-SynZipt and mCherry-SynZipS. Note the accumulation of 
both fluorophores in the M mucleus and minimal transfer of the green 
fluorophore into the P cell that eventually sporulates (arrows) e, Left panel 
follows the mating of wild-type cells expressing cytosolic s{GFP-Syn Zips 
and mCherry-SynZip3. Right panel quantifies both fluorescent signals 
inthe central region ofthe two cells corresponding tothe twa nuclei, as 
labelled on the scheme, and is presented asin Fig. 3, 
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Extended Data Fig 4| Rapid induction of Met3 is required to suppress 
tating in zygotes and prevent polyplotd formation. a, Delayed 
‘Meia-s/GFP signal when Mi expression isdlayed. The et panel shows 
individual trace ofthe green forecent signal in zygotes produced by 
inating either wild-type or MIA pai M cals wth Peels encoding 
‘Me-s{GFP. Curves were aligned to fusion time defined by entry int the 
Meellof cytosolic mCherry (expresed in the P cel) Box-and-whiskers 
plotin panel below (se Methods) analyses data shown nthe tp panel, 
rth Kruskal-Wallis P value displayed. Right panes show representative 
Uine-lapre of cells sed inthe quantification, B, Hetretalie strane with 
indicated genotypes were mised and induced to mate on MSL-N agar 
Plates, Chars report mating and sporulation efficiencies quantified after 
{wo days incubation, P> 0.05 (WeleKtest) forall emparison o wild 
types = Sexperiments wth n>200 cells each, Mating of pak cells 
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in which mis has been deleted from the P partner and cytosolic GEPis 
expressed in M cells, Transient cytosolic exchange betveenP and My cell 
never induced sporulation (n~ 10). The partners shown continue mating 
with each other unt the P cell eventually switches partner d, Time lapse 
hows pak2A P cells expressing cytosolic GEP mating with pok2Amet3 
double mutant M cell. Note that transient fusion ie ellowed by spore 
formation (n> 10) n both indicated P cells. Arrowbeads pinto spore. 
«, Time lapse shovrng that mating of M cll with Mi expressed fom the 
[po promoter with wild-type P ells results in zygotes undergoing fasion 
With additonal partners, Fusion events ae visualied through exchange 
DfeytosolicmCherzy (expressed in P cel). Time-lapse showing 

that zygotes undergo repeated fasion, evident Irom exchange of cytoxolic 
scherey (nn exosses in which met3 was specifically deleted inthe 
Palla 
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Extended Data Fig. 5 | Mei3-Mei2, but not sme2-Mei4, signalling indicated In , fusion is evidenced through transfer of M-cellexpressed 
suppresses mating in zygotes and prevents polyploid formation. GGEP. In g fusion is evidenced through transfer of Mccell-expressed s{GEP 
a, Time-lapse showing r-fusion of mei3A homothalic cells b Incidence and P-cell-expressed mCherry. b,j, Time-lapse showing diploid zygote 
of re-ferilization and zygotic growth in mating mistares ofhomothallic formation in smve2A and mei4 homothallic cells, as indicated. Note that 
strains with indicated genotypes. cg, Time-lapse shoving re-fusion of zygotes da not exhibit any growth or attempt mating with other cells 

PIA, MA, mei2A, mei2** and mei2Amei3A homothallic cells, 35 
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Extended Data Fig 6|Schematicoverview of growth condtionsand Mi genesencodedby the mal3 locus and expressed in M cells nly, Gene 


‘matt locus manipulations. a Overview of growth conditions used in 
this study (adapted with permission from rel") See Methods for detail 
by Schematic of mat! locts manipulations. The wild-type homothallc 
Pell and M cell mat loci are presented inthe middle ofthe scheme. Red 
boxes denote H2 and Hi homology bexes identical between the three mat 
loci Be boxes represent Pr and Pi genes expressed exclusively in the 

P celland encoded by the mat2 locus, and yellow boxes denote Mcand 


expression from mat2 and mat3 loci is inhibited by heterochromatin st 
(grey box). The sequences at the mat locus are derived from sequences 
at the other two mat loci during mating-type switching in homothallic 
strains (blue and yellow arrows), which relies on the Hl and H2 homology 
boxes Transformation with a DNA fragment carrying the mutant H1A17 
homology box (shaded red box) results in cells that are unable to switch 
mating type 
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Extended Data Fig.7 | Schematic overview of mat2 and mat3 loci 
‘manipulationsand obtained matloci mutants , Schematic of mat? and 
‘mat3 loci manipulations. Wild-type homothallc strain is presented on 
the top. Red boxes denote H2 and H homology boxes identical between 
the three mat loci. Blue and yellow boxes represent genes encoded by the 
‘mat2 and mat3 loc, respectively. Recombination atthe mat2 and mat 
loct is inhibited by heterochromatin state (grey box). Restriction enzyme 


sites denote positions at which prototraphic genes have been integrated in 
the TPL and PG3089 strains depicted below, which are used to manipulate 
the mat2 and mat’ loci, respectively. Ablation of genes necessary for 
hheterochromatin formation (cr or cr4) enables targeting the mat2 

and mat3 loci by homologous recombination. These deletions were 
‘Subsequently crossed out. b, Schematic representation of mat loci mutants 
‘obtained in this study. Colour-coding i asin aand Extended Data Fig 6, 
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Extended Data Fig. 8 | Nuclease treatment removes majority of DNA from cell lysates, b, DNA fom fission yeast 
with benzonase as indicated. These samples correspond to those shown in Fig. 2c, respectively 


9) and bacterial (b) lysates treated 
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Teaming with bright ideas 


Four researchers share their tips for building and maintaining international collaborations. 


he number of international research 
<ollaborations continues to rise, and 

for good reason: its easier than ever to 

connect with overseas colleagues, and doing so 
can be an effective way to share and advance 
knowledge. Nature spoke to four scientists who 
soutinely participate in such teamingsand who 
have studied how to create and ookafter them. 


KATHRIN ZIPPEL 
Be inclusive 


Sociologist, Northeastern University, 
Boston, Massachuserts. 


In our analysis of a 2006 US National Science 
Foundation survey of PhD holders in academic 
positions, we found that one-third of men 
reported taking part in an international col 
laboration, compared with only one-quarter of 
women (L.M. Frehill and K. Zippel /. Wash. 


Acad, Sci, 97, 49-69; 2011) Women also 
publish less often with international colleagues 
than men do, according to a study that ana 
lysed research performance in 12 regions 
and 27 disciplines, over 20 years. Data from 
European studies show that women’ interna: 
tional mobility starts to fall when they reach 
the postdoclevel. 

‘My recent book (K. Zippel Women in Global 
Science: Advancing Academic Careers Through 
International Collaboration; 2017) drew on 
interviews carried out between 2007 and 2015 
with more than 100 women, many of whom 
were being held back by gender-specific 
‘glass fences’ — organizational and struc. 
tural barviers that control access to resources 
and opportunities 

Itcomes down to money and time. 1 found 
that funding is a major obstacle for women 
— whether for attending conferences to find 
collaborators or for travelling to establish cal 
laborations. Women are often still the primary 
carers in ther families, so travel comes with an 


extra financial burden in the form of care costs 
fr children or other family members, 

‘Women shoulda't hesitate to ask for what 
they need to scale a glass fence, For exam 
ple, one woman Linterviewed, who had four 
children, requested help to cover child care 
‘whenever she received an invitation to speak 
‘or work abroad. She went only ifit was pro: 
vided. Women can also turn to international 
funders, but those vary in terms of how they 
acknowledge caring responsibilities 

‘Women also tend to be in teaching roles, 
rather than in research-intensive ones, and are 
‘overburdened with service obligations, such as 
serving on faculty recruitment committees or 
‘organizing lecture series. This means they have 
less time for research collaborations, 

‘And unless international collaboration 
yields results thatare valued at an institution, it 
Will be invisible, Some institutions view inter: 
national collaborations as a frivolous pursuit 
that’ likely to count less towards tenure than, 
are other endeavours. Everyone should > 


> check to what degree global engagement 
ight or might not factor into promotions, 
Still, women should think of international 
research collaborations asa way of expanding 
their academic circles and doing the kind of 
internationally recognized science that gets 
published in high-profile journals — and that 
Will help to secure tenure. 

Women who mainly teach should seek 
institutional funding so that they can travel 
abroad with students to conduct research, 
‘They can identify potential international 
collaborators by searching through journals, 
associations and government and organiza. 
tional reports to find people who do similar 
research, Itsalsoa good idea to talk to interna. 
tional colleagues on campus or at neighbour- 
ing institutions about their research plans, in 
case there are overlapping or complementary 
interests. 

Talso encourage men to be allies, to make 
sure that women get invited to conferences and 
toensure safe, respectful envionmentat such 
events, we wait for women alone to change 
the world, well be waiting along time. 


RICHARD DE GRIJS 
Identify potential 
pitfalls 

Astrophysicist and associate dean 


Jor global engagement at Macquarie 
University, Sydney, Australia 


Atany given time, Lam involved in five to ten 
collaborations. Some might have only one or 
two people. Most astronomy research, includ: 
ing mine, is driven by small collaborations, 
Its important that members ofa team get on, 
otherwise the team will go nowhere. Often, IL 
meet someone at a conference and welll end. 
up working together. Alternatively, ll need, 
someone else expertise, or they'll need mine. 

Multinational collaborations, which often 
require government investment, conieabout in 
different ways. Butaslongas the collaborators 
respect each other scientifically, things should 
befine. As wrotein "Ten simple rules for estab. 
lishing international research collaborations’ 
(R de Gris PLoS Comput, Biol. 11, €1004311; 
2015), you want collaborators who are depend: 
able, reliably meet deadlines and have a good. 
reputation. Working with such people is the 
best way to protect your own reputation, 

belong toa collaboration of 20-30 members 
\who are part ofan observational campaign at 
the European Southern Observatory in Chile, 
| was invited to join by personal contact. The 
team has been in place for around 9 years and 
has published more than 30 papers. 

is essential that all the members ofa team 
‘meetin person, ideally atleastonce every two oF 
so years — otherwise, connections are watered, 
down and the team becomes less effective, 


‘Meetings can also help to address any cultural 
differences. 'ma Dutch scientist who has col- 
laborated extensively in the United States, the 
United Kingdom and China. Students in some 
nations might look up to you and treat you as 
the person who knows everything, whereas 
inter nations, researchers are used to being 
questioned. And deadlines, particularly ifnot 
strict, can be taken more loosely in some cul- 
tures than in others. In larger collaborations, 
and in fields such as astrophysics, which have 
proprietary information, it can help to have a 
formal agreement about what’ expected of the 
team and how data will e used, 


JOSE ANTONIO CASTILLO 
MORALES 
Reach out 


Biologist, Yachay Tech University, 
Ureuqui, Bewador. 


About 80% of publications in Ecuador result 
from international collaboration. For small 
Central and South American countries such, 
as Ecuador and Paraguay, international 
collaboration is extremely important. 

Few native Ecuadorean scientists have 
doctorates and do research, but the number 
‘of people with PRDs is rising asthe national 
goverriment invests in sending young people 
abroad for their degrees. I work at Yachay Tech 
University, which opened in 2014. Roughly 
80% of the faculty members there are from 
other countries. We are funding fellowship 
programmes to train more PhD studentsin hard 
Sciences. We require students to learn English, 
in part because that is the language of science 
publications, Students are wise to build solid 
collaborations with their advisers overseas, and 
tokeep those going when they return, Ecuador 
has cultivated strong tes with programmes in 
Belgium, Spain and the United States. 

That said, most of our international 
collaborations begin because a foreign partner 
is seeking one, particularly in areas related to 
biodiversity agriculture and medicine, Ecuador 
{stich in plants and animals, and continues to 
do research on Zika, malaria and dengue. 

Researchers at Ecuador's big universities, 
particularly English speakers, are the most 
likely tobe interested in joininga collaboration, 
But, because many Ecuadoreans do not speak 
English, it might be best to send thema message 
‘in Spanish, for example using Google Translate, 
twincrease your chances ofa response, 

By law, students who have fellowships or 
studentships funded by the national govern- 
‘ment are required to come back to Ecuador, 
either to teach or to help the country in some 
other way using their knowledge. The govern- 
‘ment is trying to open posts in universities 
and companies to attract these newly qualified 
smaster’sand PhD holders, 
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LEIF OLTEDAL 
Formalize 
agreements 


Neuroscientist, Global ECT-MRI 
Research Collaboration, University of 
Bergen, Norway. 


In 2013, while using magnetic resonance 
imaging (MRI) to study neurological responses 
to lectroconvulsive therapy (ECT) asa treat 
‘ment for depression, I realized 1 was often 
being scooped by other groups doing the same 
thing. Our research group in Bergen estab 
lished a collaboration with Anders Dale, a 
‘neuroscientist atthe University of Califort 
San Diego. Nest, I searched PubMed to iden: 
tify all the other groups that were using radi: 
‘ology before and after ECT, Anders agreed 
that we should invite other groups to join a 
collaboration to analyse data longitudinally 

1n2014, contacted 12 groups, asking them 
tocollaborate Together, we could analyse data 
from 150-200 patients instead of the more 
typical 10-20. Twenty minutes after I sent 
the frst e-mail, I got reply from one group 
saying that it was a great idea. More people 
answered over the next few days. We were able 
to invite everyone — 13 international partic: 
pants from 10 sites — to Bergen for a 2-day 
‘meeting in June 2015, 1 was surprised by the 
‘nurmbet of groups that expressed interest. Our 
collaboration with Anders was crucial to the 
response. 'm not sue that we would have got 
such numbers had the invitation come just 
from me, then a postdoc in radiology. 

Before our collaboration’s first meeting 
{in 2015, we had four group conversations 
in which people introduced themselves and 
spoke about the esearch projects they were 
working on. We established a pact for the 
collaboration that included rules on author 
ship and our data-sharing agreement. We 
hhave not had direct conflicts, but we have 
discussed how to manage potential conflicts 
should they arse 

For example, we created a shared repost 
tory for project data on a common server 
‘Any collaborator who has contributed data 
can suggest what kinds of analyses he or she 
wants todo, Ifthe team agrees, the person can 
go ahead and do the analyses. We decided 
that collaborators retain the right to their 
own data. Therefore if one of us wants to do 
an analysis, anyone who doesnt think its a 
{good idea can request that their data not be 
included. We have been able to resolve any 
issues arising when two groups want to do 
the same analysis — for example, we discuss 
Whether the groups can collaborate, ar do 
Afferent analyses sequentially. 


INTERVIEWS BY VIRGINIA GEWIN 


interviews have been edited for length and clarity. 


BEY science icin 


THE PREPRINT 


BY J. W. ARMSTRONG. 


‘machine at the centre ofthe Uni- 
verse, the sole function of which is 
tocreate more time. 

‘Now, a physicist might say ‘time’ 
is the independent variable in the 
equations of motion, A poet might 
\waxlyrical about how precious it isto 
younglovers. Peshapsa mystic would 
speak of things unknowable, Most of 
us would say its the stuf that moves 
tooslowly when we're young and too 
quickly when we're old. Its the ulti- 
sate asset:the thing, especially near 
the end, that everyone wants motel. 

y consumed, well, 
all the time. Hence the necessity of, 
the machine, to continually produce 
more of it. The machine has been 
around from the beginning, faithfully 
creating ll the time the Universe needs. It 
will continue to do so for the rest of eternity, 

inferred the existence of the machine in 
(tobe both immodest and truthful) a stun- 
ning display of mathematical virtuosity. 
‘was investigating aspects of the standard 
‘model when [hit upon these beautiful equa 
tions. I was thunderstruck by their neces 
sity — and their implications. This was the 
biggest discovery in human history and, of 
course, Lhad to get proper credit frit, 

‘The obvious thing was to publish quickly, 
fier all, publication and priority are how 
academics keep score. But, following cus- 
tom, I igst gave a seminar atthe university, 
My colleagues’ reaction: A machine? That 
creates time?” They left shaking their heads, 
confident Ihad gone bonkers. None could 
see the machine’ existence followed inevi- 
‘ably from the mathematics. With wounded 
pride (and diminished regard for my 
colleagues), I withdrew to my lab 

I studied the equations. I realized they 
implied adevice could be built for transport 
—Tirst me, then my doubters — to see the 
machine. The ultimate proof! Construc- 
tion proved difficult but feasible, Details are 
in my lab notebooks. For reasons that will 
become clear later, my preprint on this topic 
will not be immediately forthcoming. 

When I finished the device — and with 
only minor trepidation — I energized it 

and was immediately 
SDNATURECOM there 


ie isnot widely known that there’ 


Time to publish. 


visitors. Or even the idea that anyone knows: 
itexists. When Larrived, I was not welcome. 
Twondered how the machine might com- 


smunicateTelepathically? Via avatar? Anti- 
climacticaly, it was justin a disembodied 
conversational voice, 

Or perhaps telepathy was involved, 
because it answered my initial question 
before asked it. "You're not the first to infer 
my existence,’ the machine commented 
sourly. “But you are the first to devisea way 
to find me. Previously only mystics could 
divine my being. They explained me as an 
elemental property of the Universe. But, 
Jacking proof, they were judged tobe mad. I 
was safely forgotten fora time” 

Ttpaused. “Youll have got better accept- 
ance from your colleagues describing me as 
‘quantum field — ‘an elemental property 
of the Universe’ — rather than a clockwork 
mechanism” 

‘Maybe it had a point. But I didn't like 
being lectured, even by an entity 13,8billion 
years old. Fascination with unfolding events 
kept me silent, however. 

‘The machine sighed. “Being human, 1 
assume you seek some boon. Immortality? 
‘Transport back in time to relive your youth? 
Even if were so inclined — which Pm not 
— such actions are on The List, prohibited 
except in extraordinary circumstances” 

objected that I had come with purity of 
motive, just to understand how the Uni- 
verse works, The machine seemed scepti- 


Follow Futures: The machine is, | cal “Really? Well, good for you." It paused. 
vatitehiuns of course, sentient, | “Since you'e solely interested in knovledge, 
Hi umatvetm/nioda 1 also does not like | rll answer the obvious questions. Is the 
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Universe causal? Approximately. Ate 
events predetermined? Partially. Do 
‘humans have free will?” It chuckled, 
as if recalling a private joke. “Sure, 
why not 

“But to the current problem: your 
arrival here is unfortunate, I can't 
simply return you, pretending this 
never happened. You academies cant 
control yourselves. Youd publish. 
Others would read your paper, build 
transport devices and, ugh, there 
would be tourists 

Itpaused thoughtfully. “Of course, 
the obvious solution is to terminate 
yyourexistence now...” 

1 didnt like where this was going, 
butbefore I could object the machine 
continued, “However, [am not with: 
‘out sympathy for the plight ofephem- 
ceral beings” It cleared its throat and 
began lecturing me again. “Knowl: 
edge and the ability to act on knowledge are 
different. Your transport device required 
sophisticated technology. And without the 
ability to travel here — to prove I exist — 
you'e just a crazy scientist with an untesta 
bleidea. Your transport capability, therefore, 
advances the problem to an ‘extraordinary 
circumstance. 'llzesolve this by sending you 
back in time to a pre-technical era. There 
you can live out your days, jabber about 
Your equations, and be hopelessly unable to 
do anything else” 

It paused. “Perhaps you'll even tell tales 
of course, will be 


T objected that this 
machine replied mild) 
let that sink in and then added, 
be flatered — your discovery isimportant 
enough to justily backward-going time 
travel tokeep ita secret!” then muttered, 
“Of course, IU be doing paperwork forever 
to rationalize the required causality viola 
tion” Recovering, it continued briskly. "Do 
you have a preference for your exile? Nean: 
derthal Europe 50,000 years ago? China 
3,000 years ago? Carthage 2,500 years ago?” 
I sighed, resigned to learning Phoen 
cian. As the machine prepared to dispatch 
sme, however, I was still thinking about how 
I might get this published. A2,500-year lead 
time does emphasize the ‘pre’ in preprint. 
Still, if could get the equations inscribed 
fon stone tablets... 


J.W. Armstrong works at a large laboratory 
in Southern California, 


